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Abstract 

The chase algorithm is a fundamental tool for query evaluation and 
query containment under constraints, where the constraints are (sub- 
classes of) tuple- generating dependencies (TGDs) and equality generating 
depencies (EGDs). So far, most of the research on this topic has focused 
on cases where the chase procedure terminates, with some notable excep- 
tions. In this paper we take a general approach, and we propose large 
classes of TGDs under which the chase does not always terminate. Our 
languages, in particular, are inspired by guarded logic: we show that by 
enforcing syntactic properties on the form of the TGDs, we are able to 
ensure decidability of the problem of answering conjunctive queries de- 
spite the non-terminating chase. We provide tight complexity bounds for 
the problem of conjunctive query evaluation for several classes of TGDs. 
We then introduce EGDs, and provide a condition under which EGDs do 
not interact with TGDs, and therefore do not take part in query answer- 
ing. We show applications of our classes of constraints to the problem of 
answering conjunctive queries under F-Logic Lite, a recently introduced 
ontology language, and under prominent tractable Description Logics lan- 
guages. All the results in this paper immediately extend to the problem 
of conjunctive query containment. 



"This is the extended version of results by the same authors, published in the KR 2008 
Conference and in the DL 2008 Workshop. 
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1 Introduction 



This paper studies a simple but fundamental logical language for ontological 
reasoning and query answering: the language of tuple- generating dependen- 
cies (TGDs). This formalism captures wide variety of description logics (DLs), 
among which families of DLs that have so far been considered completely differ- 
ent from each other, such as on the one hand OWL-bascd languages (e.g., £L [5] 
and DL-lite [13111]) and on the other hand object-based languages rooted in the 
old frame paradigm (e.g., F-Logic lite [T7]). The present paper is the full version 
of a conference paper [11] that has has been fruitfully employed and extended 
in various contexts, as has given rise to the Datalog* family [2] of ontology 
languages. Here we give a full account of the fundamental complexity results 
underlying the central and most general fragment of this language. Subsequent 
work has focused most on restrictions of the formalism |12j , to obtain complexity 
results for special cases and extensions based on different paradigms [TS] . 

Our work is also closely related to query answering and query contain- 
ment [24], which are central problems in database theory and knowledge rep- 
resentation. In most cases they are mutually reducible. These problems are 
especially interesting in the presence of integrity constraints (or dependencies, 
in database parlance) on the schema; in such a setting, they are relevant, for 
instance, to query optimization and schema integration techniques [5J [351 SS] i m 
knowledge representation they have been used for object classification, schema 
integration, service discovery, and more [22l [44] . 

A practically relevant instance of the containment problem was first studied 
by Johnson and Klug in |38j for functional and inclusion dependencies, and 
later, for instance, in [21] . Several additional decidability results were obtained 
by focusing on concrete applications. For instance, |20j considers constraints 
specific to the entity-relational diagrams, and [17| adopts constraints derived 
from a relevant subset of F-logic [33], called F-logic Lite. 

Several works in the literature consider (variants or subclasses of) tuple- 
generating dependencies (TGDs) for the purpose of reasoning and query an- 
swering. A TGD is a Horn rule with existentially-quantified variables in the 
head; in fact, the first works dealing with this kind of extension named the 
resulting language Datalog with value invention |47l [9] . More formally, a TGD 
VXVY$(X,Y) -> 3Z*(X,Z) is a first-order formula, where $(X,Y) and 
^(X, Z) are conjunctions of atoms over 1Z. If a TGD is not satisfied by a 
database instance D, then its body is satisfied, but its head is not satisfied by 
D. It is possible to enforce a TGD by modifying D and adding new atoms that 
satisfy the head. These new atoms contain labeled null-values at the positions 
of Z variables. The chase in the presence of a set S of TGDs is the iterative 
enforcement, until a fixpoint is reached, of TGDs in S. The result of the chase 
procedure, which we also call chase, can be infinite in some cases. Dealing with 
an infinite chase presents additional challenges with respect to the terminating 
chase. The chase is a fundamental tool for answering queries in the presence of 
a knowledge base constituted by a set of TGDs [18, 32[. In fact, the chase is a 
representative of all models in the theory constituted by D U E. 

In the present paper we do not focus on a specific logical theory; rather, 
we tackle the fundamental challenge underlying several of the earlier studies, 
among which j38j [20l HZ] . All these works consider constraints in the language 
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of tuple- generating dependencies (TGDs) and equality-generating dependencies 
(EGDs); they all adopt the chase technique, and all face the problem that the 
chase procedure might not terminate, thus generating an infinite result. Wc 
tackle the problem in a much more general way, that is, by carving out a very 
large class of constraints for which the infinite chase can be tamed. 

In Section [31 we define the notions of sets of guarded TGDs (GTGDs) and 
of weakly guarded sets of TGDs (WGTGDs). A TGD is guarded if its body 
contains an atom called guard that covers all variables occurring in the body. 
Weakly guarded TGDs are a generalization of guarded TGDs that require guards 
to cover only variables ocurring at affected positions, i.e., positions in predicates 
that may contain some fresh labelled nulls generated during the chase. Note 
that IDs are trivially guarded TGDs. To emphasize the importance of guards, 
we show (Theorem l2"Tj) that there is a fixed set E u of TGDs that contains several 
GTGDs and a single unguarded TGD, such that query evaluation under S u is 
undccidable. However, we show that, under weakly guarded sets of TGDs, the 
(possibly infinite) result of the chase has finite treewidth (Theorem [29)) . and 
we use this fact together with well-known results about the generalized tree- 
model property [331 EH] for a short proof that Boolean query evaluation and 
query containment is decidable under ewakly guarded sets of TGDs (and thus 
also with GTGDs). Unfortunately, this decidability result does not allow us to 
derive useful complexity bounds. 

In Section [H wc show lower complexity bounds for conjunctive query an- 
swering under weakly guarded sets of TGDs. We prove, by Turing machine 
simulations, that query evaluation under weakly guarded sets of TGDs is EXP- 
TiME-hard in case of a fixed set of TGDS, and 2EXPTiME-hard in case the TGDs 
are part of the input. 

In Section [5] we address upper complexity bounds of query answering under 
weakly guarded sets of TGDs. Let us first remark that showing that D U £ |= 
Q is equivalent to show that the theory T = D U S U {^Q} is unsatisfiablc. 
Unfortunately, T is in general not guarded, because Q isn't, and because weakly 
guarded sets of TGDs are generally non-guarded first-order sentences (while 
GTGDs are in guarded FO). Therefore, we cannot (as one may think at first 
glance) directly or easily use known results on guarded logics [33J [35] to derive 
complexity results for query evaluation. Wc thus develop completely new and 
genuine algorithms by which wc prove that the problem in question is exptime- 
complctc in case of bounded predicate arities, and even in case the TGDs is fixed, 
and in 2EXPTiME-complete in general. 

In Section [HI we derive complexity results for reasoning with GTGDs. In 
the general case, the complexity is as for WGTGDs, but interestingly, when 
reasoning with a fixed set of dependencies (which is the usual setting in data 
exchange and in description logics), we get much better results: evaluating 
Boolean queries is NP-complete, and in PTIME in case the query is atomic. Recall 
that Boolean query evaluation is NP-hard even in case of a simple database 
without integrity constraints [24] . Therefore, the above NP upper bound for 
general Boolean queries is optimal, i.e., there cannot be a class of TGDs for 
which query evaluation (or query containment) is more efficient. 

In Section [71 wc describe a semantic condition on weakly guarded sets of 
TGDs. We prove that whenever a set of WGTGs fulfills such a condition, then 
answering Boolean queries is in NP, and answering atomic queries, as well as 
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queries of bounded trccwidth, is in ptime. 

Section [5] is a technical section where we extend our results to the case of 
TGDs with multiple-atom heads; in fact, we devise all proofs for TGDs with 
single-atom heads, as this significantly simplifies proofs. The extension is trivial 
for all cases except for the case of bounded predicate arity, which we deal with 
in this section. 

Section |pj deals with equality generating dependencies (EGDs), a generaliza- 
tion of functional dependencies. Unfortunately, as shown in [2"51 1501 1551 HOI I18j . 
query answering and many other problems become undecidablc in case we ad- 
mit both TGDs and EGDs. Query answering remains undccidable even if we 
mix the simplest class of guarded TGDs, namely, inclusion dependencies, with 
the simplest type of EGDs, namely functional dependencies and, in particular, 
key dependencies [55J [50J |35J [TB] . In Section |H1 we present a sufficient semantic 
condition for sets of TGDs and general EGDs. We call EGDs innocuous when, 
roughly speaking, their application does not introduce new atoms through unifi- 
cation, but only eliminates atoms. We show that innocuous EGDs can be simply 
ignored for conjunctive query evaluation (or, equivalcntly, query containment 
testing). 

The TGD-based ontology languages in this paper are part of the larger 
family of ontology languages Datalog* [Hj. With our results, we subsume 
both the main decidability and NP-complcxity result in [38], the decidability 
and complexity results on F-logic lite [T7] and DL-lite as special cases, and we 
actually are way more general. This is shown in Section 1101 

The complexity results in this paper, together with some immediate conse- 
quence of them, are summarized in Figure [T] where all complexity bounds are 
tight, and £ denotes the set of TGDs. By "bounded width" we intend bounded 
treewidth or even hypertree width [34]. Notice that complexity in the case of 
fixed queries and fixed TGDs is the so-called data complexity, i.e., the complex- 
ity w.r.t. the data only, which is of particular interest in database applications. 
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Figure 1: Summary of results. 
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2 Preliminaries 

In this section we define the basic notions that we use throughout the paper. 

2.1 Relational model, alphabets and queries 

A relational schema 72. is a set of relational predicates, each with its associated 
arity, that is the number of its arguments. We denote by r/n a relational 
predicate r of arity n; arity (r) denotes the arity of r. Henceforth, we will 
always refer to a relational schema 1Z, assuming that database instances (also 
called databases), queries and constraints (dependencies) use predicates in 7Z. 
The schema 1Z will be sometimes omitted for the sake of brevity. 

We introduce the following pairwisc disjoint sets of symbols: (i) An infinite 
set A of constants, which constitute the "normal" domain of the instances a 
database schema 1Z. (ii) An infinite set A at of labeled nulls, which will be used 
as "fresh" Skolem terms; these are a sort of placeholders for unknown constants. 
(Hi) An infinite set Ay of variables, which are used in queries. We also assume 
a lexicographic order on A U Ajv , with every symbol in Ajv following all symbols 
in A. Sets of variables (or sequences, when the order is relevant) will be denoted 
by X, with X = X\, . . . , X k for some k. The notation 3X indicates 3Xi . . . 3X k , 
and the same holds for the universal quantifier V. 

An instance of a relational predicate r/n of arity n is a set of atoms of the 
form r(ci, . . . , c n ), where {c\, . . . , c„} C A U Ajv- Such atoms are also called 
ground atoms, facts or tuples. When the fact r(ci, . . . ,c n ) is true, we say that 
the tuple (ci, . . . , c„) is in the instance of r (or briefly is in r). An instance of the 
relational schema 1Z = \r\, . . . , r rn } is the union of the instances of n, . . . , r m . 
In the following, database instances (or simply databases) will be considered to 
have values in AU Ajy, if not otherwise stated. When such databases are treated 
as first-order formulae, each labeled null is viewed as an existential variable with 
the same name. For instance, r(a, Z\, z%, z\), where the ZiS are nulls in An and 
a € A, is treated as 3z\3z2 r(a, z\, z%, z\). If confusion does not arise, we will 
omit the existential quantifiers. 

Let A be a sequence of atoms (a.i,---,a k ) or a conjunction of atoms 
a± A ... A a k : we denote by atoms(A) the set of the atoms in A, that is 
atoms(A) = {Qi, ■ ■ ■ , Q k }. Given an atom a, ground or non-ground, the do- 
main of a, denoted by dom(a), is the set of all values (variables, constants 
or labelled nulls) that appear as arguments in a; given a set of atoms A, we 
define dom(A) = U aeA dom(a); we adopt the same notation when A is a 
sequence or a conjunction (e.g., the body of a query) of atoms: dom(A) = 
dom(atoms(A)) = {J a&atom s(A) dom(a). Given an atom a, we denote by var(a) 
the set of variables in a; if A is a set of atoms, var(A) is straightforwardly 
defined as varA = [J a€A dom(a). Again, if A is a sequence or a conjunction of 
atoms, we define varA = varatoms(A) = {J a£atoms (A) var 3L- 

An n-ary conjunctive query ( CQ) Q over a schema 1Z is a formula of the form 
q(X\, . . . , X n ) •<— $(X), where q is a predicate not appearing in 1Z, all symbols 
in X arc in Ay U A, all the variables X\, . . . ,X n appear in X, and $(X), 
called the body of Q, is a conjunction of atoms constructed with predicates from 
1Z. The arity of a query is the arity of its head predicate q; if q has arity 0, 
then the query is called Boolean (BCQ, Boolean conjunctive query). In the 
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following, in the case of Boolean queries, it will be convenient not to represent 
the head predicate and the conjunction among the atoms, and to represent a 
query as the set of atoms in $(X). In the rest of the paper, if not otherwise 
stated, we assume that queries contain no constants; it is easily seen that every 
instance of the problem of query answering for CQs can be turned in polynomial 
time into an equivalent instance of the same problem for constant-free CQs. 
Moreover, we will sometimes refer to conjunctive queries by just "queries" . 
Given a conjunctive query Q, its size \Q\ denotes the number of its atoms. 

Given a relational schema 1Z, a position r[k] in 1Z is identified by a predi- 
cate r 6 R and a natural number k, with 1 ^ k ^ arity(r): k identifies the 
fc-th argument of r, assuming an ordering on the arguments of every relational 
predicate. 

2.2 Homomorphisms 

A mapping from a set of symbols Si to another set of symbols 5*2 is a function 
/i : Si — > S*2 defined as follows: (i) (empty mapping) is a mapping; (ii) if 
fiQ is a mapping, then fi U {X —> Y}, where X £ Si and Y G S2 is a mapping 
if n does not already contain some X — > Y' with 7 / f. If X — > Y is in a 
mapping /1, we write fjL(X) = Y. The notion of mapping is naturally extended 
to atoms as follows. If a — r(ci, . . . , c„) is an atom and [i a mapping, we define 
/i(a) = r(/i(ci ),..., n{cn)). For a set of atoms, A = {a 1; . . . ,a m }, we define 
n{A) = {n{ai), . . . ,/i(a rn )}. The set of atoms n(A) is also called image of A 
w.r.t. fi. For a conjunction of atoms C = A . . . A a m , we use ^t(C) to denote 
the set of atoms ^,{atoms{C)) , that is, [i(C) = {^(fli), • ■ • ,h(Ql„i)}- 

A homomorphism from a set of atoms Ai to another set of atoms A2, with 
AiUA 2 C AuAjvUAy is a mapping [i from AuAatUAv to AuAjyUAy such 
that the following conditions hold: (1) if c S A then fi(c) = c; (%) if c G Ajy 
then //(c) G A U A^v; (3) if the atom a is in Ai, then the atom /x(a) is in A2, or 
equivalcntly /x(Ai) Ci 2 ; in this case, we say that Ai maps onto A2 via /1. 

The notion of a homomorphism serves also to define the notion of answer 
to conjunctive queries. The answer to a conjunctive query Q of the form 
q(Xi, . . . , X n ) <!— $(X) over a database instance -D, denoted by Q(D), is defined 
as follows: an atom q(t), with t G A™ is in Q(D) iff there exists a homomor- 
phism /1 that maps ^(X) to atoms of D, and (Xi, . . . ,X n ) to t. Notice that 
only null- free tuples, i.e., tuples made of constants in A, are allowed to be in the 
answer. For a Boolean conjunctive query Q, it is said that Q has positive answer 
on a database D iff g() (atom with zero arguments) is in Q(D); otherwise, it is 
said to have negative answer. 

2.3 Relational dependencies 

A central notion in this work is that of database dependencies (or constraints), 
which we consider in the context of a relational schema. In the relational 
model, among the most popular dependencies are tuple-generating dependen- 
cies (TGDs), which are generalizations of inclusion dependencies pQ. 

Definition 1. Given a relational schema 1Z, a TGD a is a first-order formula 
of the form VXVY$(X,Y) -> 3Z*(X,Z), where $(X,Y) and *(X,Z) are 
conjunctions of atoms over 1Z, called body and head of the TGD, and denoted by 
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body (a) and head(a) respectively. Such a dependency is satisfied in a database D 
for 1Z if, whenever there is a homomorphism h that maps the atoms of ^(X, Y) 
to atoms of D, there exists an extension h 2 of h (i.e., h 2 2 h) that maps the 
atoms of ^(X, Z) to atoms of D. 

To simplify the notation, we will usually omit the quantifiers in TGDs. TGDs 
will be sometimes called rules in the rest of the paper. 

2.4 The chase 

The chase process was introduced in order to enable checking implication of 
dependencies (35], and later also for checking query containment [35]: and for 
answering queries on incomplete data under relational dependencies [T5]. In- 
formally, the chase procedure is a process of repairing a database with respect 
to a set of database dependencies, so that the result of the chase satisfies the 
dependencies. By "chase" we may refer either to the chase procedure or to its 
output. The chase works on a database through so-called TGD chase rule, that 
defines the result of the applications of TGDs. TGDs are applicable in two 
flavors: oblivious and restricted. 

Definition 2 (Oblivious applicability). Consider a relational instance B for a 
schema 1Z, with values in A U Ajv, and a TGD a on 1Z of the form $(X, Y) — >• 
3Z^(X, Z). We say that a is obliviously applicable to B if there exists a 
homomorphism h that maps the atoms o/$(X, Y) to atoms of B. 

Definition 3 (Restricted applicability). Consider a relational instance B for a 
schema TZ, with values in A U An, and a TGD a onlZ of the form $(X, Y) — > 
3Z^(X, Z). We say that a is restrictedly applicable to B if there exists a 
homomorphism h that maps the atoms of $(X, Y) to tuples of B, and there 
exists no extension h' of h (i.e., no homomorphism such that h! D h) that maps 
the atoms o/^f'(X, Z) to tuples of B. 

The applicability called "oblivious" above has this denomination because 
it "forgets" to check whether the TGD is already satisfied. The "restricted" 
(non-oblivious) applicability instead requires the TGD not to be satisfied. 

TGD Chase Rule. Let a <E £ be applicable to an instance B via a ho- 
momorphism h, and hi be a homomorphism that extends h as follows: for each 
Xi G X U Y, hi(Xi) — h(Xi); for each Zj £ Z, h\{Zj) = Zj, where Zj is a 
"fresh" null, i.e., Zj € A^v, and Zj lexicographically follows all other labeled 
nulls already introduced. The application of a on B adds to B all atoms in 
/ii(4 f (X, Z)). that are not already in B. m 

The TGD chase rule as defined above is used as the basic building block to 
construct the chase of a database under a set of TGDs. Depending on which 
notion of applicability is used, we get the oblivious or the restricted chase. We 
say that the TGD chase rule is applied obliviously or restrictedly, respectively. 
How the chase procedure is performed is inductively defined below, together 
with the important notion of derivation level of an atom in the chase. 

Definition 4. Let D be a database and E a set of TGDs. 

• The oblivious (resp., restricted) chase up to derivation level 0, denoted 
Ochase°(D,Yi) (resp., Rchase°(D,T,)), is defined as D. 
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• The oblivious (resp., restricted) chase up to derivation level k, denoted 
Ochase k (D,Y>) (resp., Rchase k (D,T,)), with k ^ I, is constructed as 
follows. Let Ii,...,I m all possible images of bodies of TGDs in E, 
each Ii being relative to the corresponding homomorphism, such that 
(a) for all i such that 1 i ^ m, Ii C Ochase kl (D,Y,) (resp., 
Ii C Rchase k ~ 1 (D,Yi)); (b) the highest level of an atom in each Ii is 
k — 1. Then, obliviously (resp., restrictedly) apply the TGD chase rule 
on Ochase k ~ 1 (D,'E) (resp., Rchase k ~ 1 (D,T,)), according to a determin- 
istic execution strategy (e.g., using a linear and lexicographic order for 
the TGDs and homomorphism, respectively). Assign derivation level k to 
every newly introduced atom. 

• The oblivious (resp., restricted,) chase Ochase(D,T,) (resp., 
Rchase(D,T,)) is then defined as the limit of Ochase°(D,Y,) (resp., 
Rchase (D,T,)) for k — > oo. 

It is easy to see that, in the presence of existential variables in the head of 
TGDs, the chase might be infinite. 

2.5 Query answering under TGDs and the chase 

We now define the notion of query answering under TGDs. A similar notion is 
used in data exchange [32] [35] and in query answering over incomplete data [18] . 
Given an incomplete database, i.e., a database that does not satisfy all the 
constraints in E, we first define the set of completions (a.k.a. repairs [3]) of that 
database, which we call solutions. 

Definition 5. Consider a relational schema TZ, a set of TGDs E, and a database 
instance D for TZ. The set of instances B such that B (= E U D, is called the 
set of solutions of D given E, and is denoted by sol(-D, E). 

The following is the definition of the problem, which we denote by CQAns, 
of answering conjunctive queries under TGDs. 

Definition 6. Consider a relational schema TZ, a set of TGDs E, a database 
instance D for TZ, and a conjunctive query Q on TZ with head-predicate q. The 
answer to a conjunctive query Q on D given E, denoted by ans{Q, D, E), is the 
set of atoms q(t) such that for every B £ sol(D, E), q(t) £ Q(B) holds. 

When q(t) £ ans(Q, D, E), we also writeQ D U E U {Q} \= q(t). 

Containment of queries over relational databases has long been considered a 
fundamental problem in query optimization, especially query containment under 
constraints such as TGDs. Below we formally define this problem, which we call 
CQCont. 

Definition 7. Consider a relational schema TZ, a set E of TGDs on TZ, and two 
conjunctive queries Q\,Q2 expressed over TZ. We say that Qi is contained in 
Qi under E, denoted by Q\ C s Q 2 , if for every database instance B for TZ such 
that B \= E we have Qi(B) is a subset of Q2(B). 

Query containment and answering under TGDs as defined above are closely 
related to the notion of chase, and very close to each other, as we explain in the 
following. 

^^Here we are interpreting Q as a rule (TGD) of the form body(Q) — ¥ head(Q). 
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Theorem 8 ([2]). Consider a relational schema 1Z, a set E of TGDs on TZ, 
a conjunctive query Q with n-ary head-predicate q, and a n-ary ground tuple t; 
we have that q{t) £ ans(Q, D,T,) iff there exists a homomorphism h such that 
h(body(Q)) C Rchase(D,T,) and h(head{Q)) = q(t). 

Notice that the fact that h{body(Q)) C Rchase(D, E) and h(head(Q)) = q(t) 
is equivalent to say that q(t) £ Q(Rchase(D, E)), or equivalcntly Rchase(D , E)U 
{Q} \= Q'(t)- The result of Theorem [8] is important, and it holds because the 
(possibly infinite) restricted chase is a universal solution |32| . i.e., a representa- 
tive of all databases in sol(£>, E). More formally, a universal solution for D under 
E is a (possibly infinite) database instance U such that, for every B £ sol(£>, E), 
there exists a homomorphism that maps U onto B. In [51j it is shown that the 
chase constructed with respect to TGDs is defined also when it is infinite, and 
it is a universal solution. 

Consider a relational schema TZ, a set E of TGDs on TZ, and two queries 
Qi, Qn on 1Z. Let A be a "freezing" homomorphism for Q±, i.e., a homomorphism 
that maps every distinct variable in Q\, into a distinct labeled null in Ajv- Then 
we say that \(body(Qi)) is a set of atoms obtained by freezing the atoms in the 
body of Q\. 

Theorem 9. Consider a relational schema 1Z, a set E of TGDs on 1Z, and two 
conjunctive queries Q\,Qi on 1Z. We have that Q\ C s Q 2 iff \(head(Q\)) G 
Q2{Rchase{\{body{Qi)),Yi), where A is a freezing homomorphism for Q\. 

From the previous results, straightforwardly obtainable from |38l 151] , we 
easily conclude the following well-known result. 

Corollary 10. The problems CQAns and CQCont are mutually PTiME-reducible. 
2.6 Oblivious vs. restricted chase 

As observed in [35] in the case of functional and inclusion dependencies, things 
are more complicated if the restricted chase is used in place of the oblivious, 
since applicability of a TGD depends on the presence of other atoms previously 
added to the database by the chase. Indeed, the restricted chase of a database 
D with respect to a set of TGDs E is universal for D under E, i.e., there 
exists a homomorphism from the restricted chase to every solution, including the 
oblivious chase. However, it is technically much easier to use the oblivious chase, 
and it can be used in lieu of the restricted chase because, as we shall prove now, 
the oblivious chase is also universal. This result, to the best of our knowledge, 
has never been explicitly stated. It can be proved with a technique similar to 
that of [22 for terminating chase; however, for the sake of completeness, we 
present a complete proof here. 

Theorem 11. Consider a set E of TGDs on a relational schema 1Z, and 
let D be a database on 1Z. Then there exists a homomorphism fi such that 
lj(Ochase(D,T,)) C Rchase(D,T,). 

Proof. The proof goes by induction on the number m of applications of the 
TGD chase rule, in the construction of the oblivious chase Ochase(D,'E). We 
want to prove that for all m with m ^ there exists a homomorphism from 
Ochase m (D, E) to Rchase(D, E). Base case. In the base case, where m = 0, no 
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TGD rule has yet been applied, therefore Ochase°(D, S) = DC Rchase(D, E), 
so the existence of a homomorphism from Ochase(D,T.) to Rchase{D 1 Ti) is 
witnessed by the identity homomorphism (1q. Inductive case. Assume we have 
applied m times the TGD chase rule, obtaining Ochase m (D, E). Now, by the in- 
duction hypothesis, there exists a homomorphism fi m that maps Ochase m (D, E) 
into Rchase(D, E). Consider the (m + l)-th application of the TGD chase rule, 
for the TGD: Y) -> 3Z^(X, Z). This means, by definition of applicability 

of a TGD, that there is a homomorphism Xo that maps $>(X,Y) to atoms of 
Ochase{D, E); as a consequence, Xo is suitably extended to X' , according to 
the TGD chase rule, so that X' Q maps each of the variables in Z to a fresh null in 
Ayv, not already present in Ochase m (D, E); then, all atoms in X' (^(X, Z)) are 
added to Ochase m (D,T,), thus obtaining Ochase m+1 (D, E). There also exists 
another homomorphism A# that maps $(X,Y) to atoms of Rchase(D ,E); in 
particular, At? = Xo°fJ- m - Since Rchase(D, E) satisfies all the dependencies in E 
(and so does Ochase(D, E)), there is an extension X' R of A# that maps ^{X, Z) 
to tuples of Rchase(D, E). Denoting Z = Z\, . . . , Z^, we now define 

Mm+l = Mm U {X' (Zi) — » A^(Zi)}i^ i: gfe 

To complete the proof, we now need to show that fi m +i is indeed a homomor- 
phism. The addition of X' (Zi) —y X R (Zi), with 1 i ^ k, is compatible with 
/i TO , because none of the X' (Zi) appears in u. m ; therefore /Lt m +i is a well-defined 
mapping. Now, consider a generic atom r(X, Z) in ^/(X, Z); X' (r(X, Z)) is the 
(single) atom added to Ochase(D, E) in the (m + l)-th step; notice that: 

/i m+1 (r(X, Y)) = t Mn+MX' {X),X' {Z))) = r{u, m+1 (X' (X)^ m+1 (X' (Z)) 

Also, notice that ji m+ i(X' {X)) = ^ m+1 (X (X)) = X R (X) = X' R (X), and 
(X'o(Z)) = X' R {Z). Therefore, 

Mm+1 (r(X,Z)) = R(X' R (X),X' R (Z)) = X' R {r(X,Z)) 

which is in Rchase(D,Yi) by construction. 

The desired homomorphism from Ochase(D, E) to Rchase(D, E) is eventu- 
ally I- 1 = Ui*lo Mi- 1=1 

Corollary 12. Given a set E o/ TGDs on a relational schema 1Z and a database 
D for 1Z, Ochase{D 1 E) is a universal solution for D under E. 

Corollary 13. Given a Boolean query Q over a schema 1Z, a database D for 
7Z, and a set of TGDs E, Ochase{D 1 E) |= Q if and only if Rchase{D ', E) |= Q. 

Since it is usually more convenient, from the technical point of view, to deal 
with the oblivious chase, in the following, unless explicitly stated otherwise, 
"chase" will mean oblivious chase, and chase(D, E) will stand for Ochase(D, E). 

2.7 Decision problems 

Recall that, given a database D, a set E of TGDs, and a Boolean conjunctive 
query Q, by Theorem [51 D U E |= Q iff chase(D,T,) \= Q. Based on this, we 
define two relevant decision problems and prove their LOGSPACE-equivalence. 
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Definition 14. The conjunctive query evaluation decision problem CQeval is 
defined as follows. Given a conjunctive query Q with n-ary head predicate 
q, a set of TGDs E, a database D and a ground n-tuple t, decide whether 
q(t £ ans{Q 1 D 1 Ti)) or, equivalently, whether chase(D,Y>) U {Q} |= ?(t). 

Definition 15. The Boolean conjunctive query evaluation problem BCQeval is 
defined as follows. Given a Boolean conjunctive query Q, a set of TGDs E, and 
a database D, decide whether chase(D, E) |= Q. 

The following result is implicit in |24) . 

Lemma 16. The problems CQeval and BCQeval are LOGSPACE-equivalent. 

Proof. Notice that BCQeval can be trivially made into a special instance of 
CQeval, e.g., by adding a propositional atom as head atom. It thus suffices to 
show that CQeval polynomially reduces to BCQeval. Let (Q, D, E, q(t)) be an 
instance of CQeval, where q/n is the head predicate of Q and t is a ground 
n-tuplc. Assume the head atom of Q is q(X\, . . . , X n ) and q(t) = q(c\, . . . , c n ). 
Then, define Q' to be the Boolean conjunctive query whose atoms are those in 
body(Q) plus q'(X\, . . . , X n ), where q' is a fresh predicate symbol not occurring 
in D and Q (and therefore not in E, since the TGDs in E are on the same 
relational schema). It is easy to see that q(t) £ Q(chase(D, E)) iff chase(E, DU 
{<?'(ci,...,c„)}) h Q'- □ 

By the above lemma, and by the well-known equivalence of the problem of 
query containment under TGDs with the CQeval problem (Corollary [TOj) , the 
three following problems arc LOGSPACE-equivalent: (1) CQ-eval under TGDs, 
(2) BCQ-cval under TGDs, (3) query containment under TGDs. Henceforth, we 
will concentrate on only one of these problems, namely the BCQ-cval problem. 
All complexity results carry over to the other problems. 

Dealing with multiple head-atoms. From the technical point of view, it 
turns out that dealing with multiple atoms in TGD heads complicates the proof 
techniques. Henceforth, we shall then assume that all TGDs have a single atom 
in their head. After proving our results for TGDs with a single head-atom, we 
shall then extend all such results to the case of multiple-atom heads in Section[S] 

2.8 Tree decomposition and related notions 

We now give some preliminary notions about tree decompositions. A hypergraph 
is a pair % = (V, H) , where every h £ H is called hyperedge, and it is a subset 
of V. The Gaifman graph of a hypergraph %, denoted by Qn, is an undirected 
graph having the same V as set of nodes, and such that there is an edge (i>i, V2) 
iff v\ and vi both occur in the same hyperedge in H. 

Given a graph Q = (V, E) , a tree decomposition of Q is a pair (T, A) , where 
T = (N, A) is a tree, and A a labeling function A : N — s> 2 V such that: 

(i) for all v £ V, there exists n £ N such that v £ A(n); more briefly, 

X(N) = V (where X(N) denotes U„ e w A (")); 
(ii) for every arc e £ E, with e = (^1,^2), there exists n £ N such that 
X(n) 2 {vi,v 2 }; 
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(Hi) for every v £ V, the set {n £ N | d£ A(n)} induces a connected subtree 
in T. 

The width of a tree decomposition (T, A) is the integer value max{ | A (n) | — 1 | n £ 
iV}. The treewidth of a graph C? = (V,E), denoted by tw(G), is the minimum 
width of all tree decompositions. Given a hypergraph TL, its treewidth tw(H) 
is defined as the treewidth of its Gaifman graph: tw(TL) = tw((/%). Notice that 
the notion of treewidth immediately extends to structures; therefore, since we 
can see database instances and queries as structures, the treewidth of databases 
and queries is defined. 

A class C of first-order formulae enjoys the bounded-treewidth model property 
if there exists some k such that, for every theory T C C, whenever a formula 
<t> £ T is dccidable, then </> has a model of treewidth at most k. 

The following result is straightforwardly follows from [531 EE] ■ 

Theorem 17. If a class of first-order formulae C has the bounded-treewidth 
model property, then for every theory T C C checking satisfiability for formulae 
in T is dccidable. 

3 Decidability 

This section introduces the special classes of guarded and weakly- guarded TGDs, 
which enjoy several useful properties. As mentioned in the introduction, we show 
that query answering under the aforementioned classes is dccidable. 

Definition 18. Given a TGD a of the form $(X,Y) — » *(X,Z), we say that 
a is a (fully) guarded TGD (GTGD) if there exists an atom in the body, called 
a guard, that contains all the universally quantified variables of a, i.e., all the 
variables X, Y that occur in <I>(X, Y). 

We now introduce weakly guarded sets of TGDs. We first give the notion 
of affected position of a relational schema, given a set of TGDs S. Intuitively, 
a position it is affected in a set of TGDs S if there exists a database D such 
that a labeled null appears in some atom of chase(D,T,) at position tt. The 
importance of affected positions for our definitions is that no labeled null can 
appear in non-affected positions. We are now ready for the formal definition, 
which is in inductive form. 

Definition 19. Given a relational schema 1Z and a set of TGDs S over TZ, a 
position TTh in the predicate of the head atom of a TGD a in £ is affected with 
respect to S if either: 

• (base case) an existentially quantified variable appears in 7r^, or 

• (inductive case) the variable appearing at position tt/j in head(o~) also 
appears in the in body (a), and only at affected positions. 

Example 1. Consider the following set of TGDs: 

a x : Pl {X,Y),p 2 {X,Y) 3Z p 2 (Y, Z) 

a 2 : P2 (X,Y),p 2 (W,X) ->. P i{Y,X) 

Notice that p 2 [2] is affected since Z in a\ is existentially quantified in u\ . Con- 
sidering again g\ , the variable Y appears in p 2 [2] but also in p\ [2] , therefore 
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it does not make the position p 2 [l] affected. In a 2 , X appears in the affected 
position p 2 [2] but also in p 2 [l], which is not affected; therefore, it does not make 
the position p\ [2] affected. Differently, in <r 2 , ^ appears in p 2 [2] and nowhere 
else, thus causing p\ [1] to be affected. □ 

Definition 20. Consider a set of TGDs S on a schema K. A TGD a e £ of the 
form $(X, Y) — >• ^(X, Z) is said to be weakly guarded with respect to £ if there 
exists an atom in body (<r) , called a weafc guard, that contains all the universally 
quantified variables of a that appear in affected positions w.r.t. £ and do not 
also appear in non-affected positions w.r.t. £ (see Definition [T^| . The set £ is 
said to be a weakly guarded set of TGDs if each TGD p G £ is weakly guarded 
w.r.t. £. 

In the following, we use the notion of guard of a GTGD (or of a TGD in a 
weakly guarded set) also in the presence of more than one guard; in such cases, 
we assume we take the lexicographically first among the guards; we could also 
pick any other criterion for choosing the guard, without changing anything in 
our proofs or results. 

The following theorem shows that it is essentially non-guarded rules that are 
responsible for the undecidability of the main problems treated in this paper. 
Even a single unguarded rule can destroy the decidability of simplest reasoning 
tasks under TGDs. 

Theorem 21. There exists a fixed set of TGDs £„ such that all but one TGDs 
of £ u are guarded, and a Boolean conjunctive query Q such that it is undecidable 
to determine whether DWS U |= Q, or, equivalently, whether chase(D,'E u ) \= Q. 

Proof. The proof hinges on the fact that with appropriate input facts D, using 
a fixed set of TGDs comprising guarded TGDs and a single unguarded TGD, it 
is possible to force an infinite grid to appear in chase(T, Ul D). By a further set 
of guarded rules, one can then easily simulate the behaviour of a deterministic 
Turing machine (TM) A4 with an empty input tape. This is done by using 
the infinite grid, where the i-th horizontal line of the grid represents the tape 
content at instant i. We assume that transitions of the Turing machine M. 
are encoded into a relation trans of D, where for example, the ground atom 
trans (s i, a±, s 2 , a 2 , right) means "if the current state is s and symbol a\ is read, 
then switch to state s 2 , write a 2 , and move to the right" 

We show how the infinite grid is defined. Let D contain (among other 
initialization atoms that fix the initial configuration of M.) the atom index (0), 
which fixes the initial point of the grid. Also, we make use of three constants 
right , left , stay for encoding the three types of moves. Consider the following 
TGDs: 

index(X) -> 3Y next(X,Y) 
next(X,Y) — > indexiY) 
trans(T),next(X 1 ,X 2 ),next(Y L ,Y 2 ) -> grid(T,X 1 ,Y 1 ,X 2 ,Y 2 ) 

Note that only the last of these three TGDs is unguarded. The above TGDs 
define an infinite grid whose points have co-ordinates X and Y (horizontal 
and vertical, respectively), and where for each point its horizontal and vertical 
successors are also encoded, and where, in addition, each point appears together 
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with each possible transition rule. It is not hard to see that we can simulate 
the progress of our Turing machine M. using suitable initialization atoms in D 
and guarded TGDs. To this aim, we need additional predicates cursor(Y, X) 
(the cursor is in position X at time Y), state(Y, S) (M is in state S at time Y), 
content(X,Y, A) (at time Y, the content of position X in the tape is A). The 
following rule encodes the behaviour of the TM Ai on all transition rules that 
move the cursor to the right: 

grid (Si, Ai, S 2 ,A 2 , right, Xi, Yi, X 2 , Y 2 ), 

cursor(Yi, Xi), state(Yi, Si), content(Xi, Yi, Ai) — > 
cursor(Y 2 , X 2 ), content(Xi, Y 2 , A 2 ), state(Y 2 , S 2 ), mark(Yi, Xi) 

Such a rule has also obvious sibling rules for "left" and "stay" moves. 

Notice that the mark predicate in the head marks the tape cell that is modi- 
fied at instant Yi. We now need additional "inertia" rules, which ensure that all 
other positions in the tape are not modified between Y\ and the following time 
instant Y 2 . To this aim, at every instant Yi, we adopt two different markings: 
keep j for the tape positions that follow the one marked with mark, and keep p for 
those that precede it. In this way, we are able, by making use of guarded rules 
only, to ensure that every tape cell X such that keep p (Yi, X) or keep f(Yi,X) is 
true keeps the same symbol at the following instant Y 2 . The rules below then 
propagate the aforementioned markings forward and backwards, respectively, 
starting from the marked tape positions. 

mark(Yi,Xi),gnd(T,Xi,Yi,X 2 ,Y 2 ) -> keep f (Y x ,X 2 ) 
keep f (Yi,Xi),grid(T,Xi,Yi,X 2 ,Y 2 ) kee Pf (Yi,X 2 ) 
mark (Yi ,X 2 ), grid (T, X u Yi , X 2 , Y 2 ) -> keep p ( Y x , X x ) 
keep p (Yi,X 2 ),grid(T,Xi,Yi,X 2 ,Y 2 ) -> keep p (Yi,Xi) 

The actual inertia rules follow, for all a £ {ai, . . . ,ai, b}, where {ai, . . . , ag, b} 
is the tape alphabet. 

keep f(Yi,Xi), grid(T, X±, Yi,X 2 ,Y 2 ), content(Xi, Yi,a) — > content(Xi, Y 2 , a) 
keep p (Yi, Xi), grid(T, Xi, Yi, X 2 , Y 2 ), content(Xi, Yi, a) — > content(Xi, Y 2 , a) 

Notice that we use the constant a instead of a variable in the above rules in 
order to have the guardedness property. We therefore need two rules as above for 
every tape symbol, that is, 2£ + 2 inertia rules altogether. Observe also that the 
fact that some rules above have multiple atoms in the head is not a problem, as 
such rules have no existentially quantified variables in the head. Therefore, each 
TGD with multiple head-atoms can be easily replaced by a set of TGDs with 
single-atom heads, and all having the same body. Finally, we assume without 
loss of generality that our Turing machine Ai has a single halting state sq which 
is encoded by the atom halt(s^) in D. We add a guarded rule 

state(Y, S), halt(S) — > stop 

It is now clear that the machine halts iff chase(T, u , D) \= stop, i.e., iff DUT, U \= 
stop. We have thus reduced the halting problem to the problem of answering 
atomic queries to a database under £„. The latter problem is thus undccidablc. 
□ 
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Definition 22 (Guarded chase forest). Given a weakly guarded set of TGDs 
E and a database D, the guarded chase forest for D and E, denoted gcf(D, E), 
is defined as follows. 

(a) For each atom d€D, there is a node in the forest labelled with d; such 
nodes are the roots of the trees in the forest. 

(b) For every node labeled with an atom a £ chase(D, E), and for every 
atom b obtained from a and possibly other atoms by a one-step application 
of some TGD a £ E. with a as image of the guard through the homomor- 
phism corresponding to the application, there is one node labeled with b 
and an arc from a to b. 

Definition 23. Let D be a possibly infinite instance with values in AuAjvUAy 
for a schema 1Z, and let S be a set of symbols such that S C dom(D). 

• An [S'J-join forest (F, of D is an undirected labeled forest F = (V, E) 
whose labeling function fi : V — > D is such that: 

(1) for each atom d in D, there exists v £ V such that /i(v) = d; 

(2) T is [S'J-conncctcd, i.e., for each c £ dom(D) — S, the set {v £ V | c £ 
dom([i(v))} induces a connected subtree in T. 

• We say that D is [SJ-acycZic iff D has an [,5] -join forest. 

The above definition generalizes the classical notion of hypergraph acyclic- 
ity [5] of an instance (or, equivalently, of a query). In fact, an instance or a 
query, seen as a hypergraph, is hypergraph-acyclic if and only if it is [0]-acyclic 
(see also [3"T]). 

The following Lemma follows from the definitions of [S'J-acyclicity. 

Lemma 24. Given a database instance D for a schema 1Z, and a set S , if D 
is [S]-acyclic, then tw(D) ^ IS"! + w, where w is the maximum predicate arity 
in 1Z. 

Proof. By hypothesis, D is [5]-acyclic and therefore has an [S'J-join forest (F, /x), 
with F = (V,E). A tree decomposition (T, A), with T = (N, A), is constructed 
as follows. We take N = V U {no}, where no is an auxiliary node. Let us denote 
with V r , with V r C V, the set of nodes which are roots in the [S'J-join forest F; 
we then introduce a set of arcs A r , from n$ to each node in V r , and we take 
A = N U A r . The labeling function is defined as follows. 



We now show that (T, A) is a tree decomposition. Recalling the definition given 
in Section 12.81 (i) holds trivially because F is a join forest and /x "covers" 
all atoms in D. As for (ii), we notice that arcs in the Gaifman graph of D 
are such that for each atom d = r(c\, . . . ,c m ) in D there is a clique among 
nodes ci, . . . , c m . Since for the same atom there exists v £ V such that /u,(v) = 
d, and that \{v) D dom(fx(v)) , (ii) holds immediately. Finally we consider 
connectedness; let us take a value c appearing in D as argument. If c £ S, the 
set {v £ N X(v) 3 c} is the whole TV by construction, therefore connectedness 
holds; if c ^ S, the set {v £ N \ X(v) 9 c] induces a connected subtree in F and 
therefore in T, given that A(i>) = [i(v) U S. Therefore, (Hi) holds. □ 




S for v = no 

dom([i(v)) U S for v / no 
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Definition 25. Let D be an instance for a schema 1Z. The Herbrand Base 
HB(D) of D is the set of all atoms that can be formed using the predicate 
symbols of 1Z and arguments in dom(D). We define: 

• chase ± (D, E) = chase(D, E) n HB(D), and 

• chase + (D, E) = chase(D,Y}) — chase (.D,E) 

Notice that chase ± (D, T,)Uchase + (D, E) = chase(D,T,) and chase^{D, E)n 
chase + {D,Yt) = 0. Moreover, if D is null-free (which will be the case in 
many applications), then chase' L {D, E) is the finite set of all null-free atoms 
in chase(D,Y>), while chase + (D, E) may be infinite. 

Lemma 26. If is a weakly- guarded set of TGDs and D an instance, then 
chase(D,Yi) is [dom{D)\- acyclic. 

In order to prove this result, we resort to an auxiliary lemma. 

Lemma 27. Let D be an instance and E a weakly- guarded set of TGDs. Let 
a s be an atom o/gcf(_D,E) where the value c € An is first introduced, and let 
a j be an atom of gci(D, E) where c appears as argument. Then, c appears in 
every atom in the (unique) path from a s to aj. 

Proof. Let us denote with a ± = a d ,a 2 , ■ ■ ■ ,a n = at the path from a s to a^. 
Observe that, by definition of affected position, c appears in affected positions 
in whatever atom it appears in. By contradiction, assume that c does not appear 
in some intermediate atom in the path. Then, there is an i, with 2 ^ i ^ n — 1, 
such that c does not appear in a i , but appears in a i+1 . Since c appears only in 
affected positions, in order to appear in a i+1 it has either to appear in a 4 or to 
be invented during the addition of a i+1 . The first case is false by hypothesis, 
and the second is not possible because c is first introduced in a 1 and each fresh 
value is introduced only once in the chase. We therefore have a contradiction, 
which ends the proof. □ 

Armed with this preliminary result, we now come to the proof of Lemma [26l 

Proof. We prove this result by construction, exhibiting a [dom(D)]-join forest 
F = (V,E) for chase(D 7 T l ). We take F as gcf(D,E) and we define, for each 
atom d £ gcf(-D, E), the labeling function A of F as fi(d) = d. Since each atom 
of chase(D, E) is "covered" by its corresponding node of F, it only remains to 
show that chase(D,T,) is [dom(D)]-connccted. Take two distinct atoms a-^c^ 
in gcf(_D, E) where the same value c e Ajy appears as argument. % and a 2 have 
a common ancestor a in gcf(Z), E) where c is invented, because if they did not, 
the value c would have to be introduced twice in chase(D, E). By Lemma [27] c 
appears in all atoms on the paths from a to and from a to a 2 . It immediately 
follows that the set {v e V | c£ m( w )} induces a connected subtree in F. This 
proves the claim. □ 



Lemma 28. If E is a weakly- guarded set of TGDs and D an instance of a 
schema 1Z, then tw(chase(D,Yi)) ^ \dom{D)\ + w, where w is the maximum 
predicate arity in 1Z. 

Proof. The claim of the lemma is straightforwardly obtained by Lemma [24] and 
Lemma [2G] □ 



16 



Theorem 29. Given a relational schema 1Z, a weakly guarded set of TGDs S, 
a Boolean conjunctive query Q, and a database instance for 1Z, the problem of 
checking whether D U £ |= Q, or equivalently chase(D,T,) \= Q, is decidable. 

Proof. We rely on the fact that both chase(D,Y 1 ) A Q and chase(D,T,) A -iQ 
have a (possibly infinite) model of finite treewidth, when they are satisfiablc. 
This follows from the fact that chase(D,Yi) is universal for D under £ and 
has finite treewidth (see Lemma 1281 Our claim now follows by a well-known 
result of Courcelle [28], that generalizes an earlier result of Rabin [55j. This 
result states that classes of first-order logic (more generally, monadic second- 
order logic) that enjoy the finite treewidth model property are decidable. A 
class C of formulae has the finite-treewidth model property if for each <j) G C, 
whenever <j) is satisfiable, then it is possible to compute a number /(</>) such 
that (j> has a model of treewidth at most /(</>) (see also [33J 131] , where a more 
general property called the generalized tree- model property is discussed). □ 

The above theorem establishes decidability of query answering under weakly 
guarded sets of TGDs, but it tells nothing about the complexity. Understanding 
the complexity of query answering under sets of guarded TGDs and weakly- 
guarded sets of TGDs will require novel techniques, which will be the subject 
of the next sections. 

4 Complexity: Lower Bounds 

In this section we present several complexity lower bounds about the (decision) 
problem of answering Boolean conjunctive queries under guarded TGDs and 
weakly-guarded sets of TGDs. 

Theorem 30. The problem BCQeval under weakly-guarded sets of TGDs is 
EXPTiME-hard in case the TGDs are fixed. The same problem is 2exptime- 
hard when w is not bounded. Both hardness results also hold for fixed atomic 
ground queries. 

Proof. It is well-known that APSPACE (alternating PSPACE, see [55]) equals EX- 
PTIME. Notice that alternating linspace is already EXPTiME-hard, so to prove 
our claim it suffices to simulate the behavior of an Alternating Turing Ma- 
chine (ATM) A4 on an input / (which is a bit-string) by means of a weakly 
guarded set of TGDs £ and an instance D. Namely, we will exhibit a BCQ Q 
and we will show that M. accepts the input / iff D U £ |= Q, or equivalently 
chase{D,Y.) \= Q. 

We start from the case of fixed £. Without loss of generality, we can assume 
that the ATM M. has exactly one accepting (halting) state, which we denote 
with Sf. We also assume that A4 never tries to read beyond its tape boundaries. 
Let M be defined as 

M = (S,A,\>,8,s ,F) 

where S is the set of states, A is the tape alphabet (assumed to be {0, 1, b}), b 
is the blank tape symbol, S is the transition function, defined as 5 : S x A — > 
(S x A x {£, r, _L}) 2 (_L denotes the "stay" head move, while I and r denote 
"left" and "right" respectively), so £ S is the initial state, and F C S is the 
set of final states. Being A4 an alternating Turing machine (ATM), the set of 
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states S is partitioned into two sets Sy and Sa (universal and existential states, 
respectively). The general idea of the encoding is that configurations of A4 will 
be represented by the fresh constants that are generated in the construction of 
the chase. In particular, a special constant k will represent the initial configu- 
ration, while Vi, where i ^ 1, will denote the fresh constants that are generated 
by the chase. 

The relational schema. We now describe the predicates of the schema which 
we use in the reduction. Notice that the schema is fixed and does not depend 
on the particular ATM that we encode. The schema predicates are as follows. 

(1) Tape. The ternary predicate symbol(a,c,v) denotes that in configu- 
ration v the cell c contains the symbol a, with a G A. Also, a binary 
predicate succ(ci, C2) denotes the fact that cell c\ follows cell C2 on the 
tape. Finally, neq(c\,C2) says that two cells are distinct 

(2) States. A binary predicate state(s,v) says that in configuration v the 
ATM Ai is in state s. We use three additional unary predicates: existential 
and universal, and accept: existential (s) (resp. universal(s)) denotes that 
the state s is existential (resp. universal), while accept (s) expresses the 
fact that the state s is an accepting state. 

(3) Configurations. A unary predicate config(c) expresses the fact that 
the constant c is a valid configuration. A ternary predicate next(v, V\, V2) 
is used to say that both configurations V\ and V2 are derived from v. 
Similarly, we use follows(v, v') to say that configuration v' is derived from 
v. Finally, a unary predicate init(v) states that the configuration v is 
initial. 

(4) Head (cursor). We use fact cursor (c,v) to say that the head (cursor) 
of the ATM is on cell c in configuration v. 

(5) Marking. Similarly to what is done in the proof of Theorem [21] we use 
mark(c, v) to say that a cell c is marked in a configuration v. Our TGDs 
will ensure that all non-marked cells keep their symbols in a transition 
from one configuration to another. 

( 6) Transition function. To represent the whole transition function 
5 of an ATM, we use a single 8-ary predicate transition: for ev- 
ery transition rule S(s,a) = ((si, a\, mi), (S2, a2, WI2)) we will have 
transition^, a, si, ai, mi, S2, 0,2, m<i). 

The database instance D. We construct a database out of distinct (and 
possibly infinite) alphabets of cells, configurations, and states. For brevity, we 
will not specify to which alphabet each constant belongs, since this will be 
obvious from the context. In particular, we will use an accepting state s a and 
an initial state sq, and a special initial configuration n. The database describes 
the initial configuration of the ATM, plus some technicalities. 

(a) We consider the input / which, without loss of generality, are assumed 
to occupy the cells numbered from 1 to n, i.e., ci, . . . , c n . Therefore, for 
the z-th cell of I containing the tape symbol a, the database has the fact 
symbol(a, Cj, k). 
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(b) An atom state(so, n) specifies that M starts in state sq and it is the 
initial configuration. 

(c) For every existential state se and a universal state sjj , we have the facts 
existential (se) and universal(su). For the accepting state, the database 
has the fact accept (s a ). 

(d) An atom cursor (ci, k) indicates that, in the initial configuration, the 
cursor points at the first cell. 

(e) The atoms succ{c\, c 2 ), . . . , succ(c n ~i, c n ) encode the fact that the cells 
ci, . . . , c n are adjacent. Also, atoms of the form neq(ci, Cj), for 1 ^ i,j ^ n, 
with i 7^ j, denote the fact that the cells ci, . . . , c„ are pairwise distinct. 

(f) The atom confi,g(n) says that k is a valid configuration. 

(g) The database has the atoms of the form 

transition's, a, s%, 01, mi, S2 7 02, 7712) 

which encode the transition function (5, as described above. 

The TGDs. Once the database is set, we are ready to describe the TGDs that 
define the transitions between configurations and the accepting configurations 
of the ATM. 

(a) Configuration generation. The following TGDs say that, for every state 
(halting or non halting — we do not mind having configurations that are 
derived from a halting one), there are two configurations that follow it, 
and that a configuration that follows another configurations is also a valid 
configuration: 

config(V), state(S, V) -> next(V, V±, V 2 ) 

next(V, V\ , V2 ) —> config{V\ ) , oconfig{V-2 ) 
next{V, Vx,V 2 ) -> follows (V, V\) 
next(V,Vx,V 2 ) -> follows(V,V 2 ) 

(b) Configuration transition. The following TGD encodes the transition 
where the ATM starts at an existential state, moves right in its first con- 
figuration and left in the second. Here C denotes the current cell, C\ 
and C2 are the new cells in the first and the second configuration (on the 
right and on the left of C, respectively), Mi, M 2 represent the two moves, 
and the constants r and I represent the "right" and the "left" moves, 
respectively. 

transitions, A, S 1 ,A U M 1 ,S 2 , A 2 , M 2 ),Mt = r, M 2 = t, next(V, Vi,V 2 ), 
state(S, V), cursor(C, V), symbol(A, C, V), succ{C\,C), succ(C, C 2 ) — > 
state(Si, Vi), state(S 2 , V 2 ), symbol (Ai, C\, V\), symbol(A 2 , C 2 ,V 2 ), 
cursor(C\, V\), cursor{C 2 , V 2 ), mark{C, V), 
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The other eight kinds of moves of the ATM are encoded by analogous 
TGDs. The above rule (and its seven siblings) suitably mark the cells 
that are written by the transition by means of the predicate mark. The 
cells that are not involved in the transition must retain their symbols, 
which is specified by the next TGD: 

config(V),follows(Vi,V), mark(C, V), 

symbol(Ci,A, V),neq(Ci,C) ->■ symbol(Ci, A, Vj.) 

(c) Termination. The meaning of the following rule is clear: 

state(s a ,V) — > accept(V) 

The following TGDs state that, for existential states, at least one con- 
figuration derived from it must be accepting. For universal states, both 
configurations must be accepting. 

next(V, Vi, V2), state(S, V), existential (S), accept(Vi) — > accept (V) 
nextiy, V\, V2), state{S, V), existential (S), acceptiy^) — > accept(V) 
next(V, Vi, V2), state{S, V), universal(S), accept(Vi), accept(V2), — > accept (V) 

Notice that the above TGDs often have more than one atom in the head, 
but since heads have no existentially quantified variables in tsuch rules, it is 
trivial to replace each of them with a set of TGDs that have only one predicate 
each in the head, and all the same body. The above construction uses multiple 
heads for more clarity. 

It is not hard to show that the encoding described above is sound and com- 
plete. That is, M. accepts the input I if and only if chase{D, S) (= accept(n). It 
also easy to verify that the set of TGDs we have used is weakly-guarded. This 
proves the claim. 

In the case where the arity of predicates in TZ is not fixed, instead of simu- 
lating a linspace ATM, we are able to simulate 2™ tape cells, where n is the 
length of the input. This is done, for example, with a predicate count/ {n + 1) 
which serves as n-bit binary counter, and that at the same time generates 2 n 
tape cell symbols Co,Ci,.... The first n arguments of count have values either 
1 or 0, while the (n + l)-th argument is the cell corresponding to the number 
encoded by the values in the first n arguments; wc introduce the single fact 
count(0, . . . , 0, Co) in the initial database -D, and the following rules to generate 
all numbers and the corresponding tape cell symbols. Notice that, in order to 
avoid notational clutter, we use the constants 0, 1 in the rules; this is not nec- 
essary as we could have two unary predicates zero/1 and one/1 which are true 
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when the argument is and 1 respectively, and the facts zero(0), one(l) in D. 

count{X\, . . . , X n —i, 0, C) — > 3Ci count(Xi, . . . ,X n _i,l,Ci), 

succ(C, C\) 

count(Xi, . . . , X n -2, 0, 1, C) — > 3C\ count{X\, . . . , X n _2, 1, 0, Ci), 

succ(C, C\) 



countiXx, . . . ,X n _j,0, 1, . . . , 1, C) -t 

count(Xx,0,l,...,l,C) -> 
count(0, 1, . . . , 1, C) — > 



3d count (Xi, . . . , X„_ 4 , 1, 0, . . . , 0, Ci), 
succ(C, Ci) 

3Ci cowni(Xi, 1, 0, . . . , 0, Ci), smcc(C, Ci) 
3Ci count(l, 0, . . . , 0, Ci), succ(C, C\) 



The above rules simply encode the increment of the binary counter. Also in this 
case, the multiple head-atoms do not create problems as it is straightforward to 
rewrite such rules in order to obtain an equivalent set with single-atom heads 
(see Theorem [55] at page FH]) . It is straightforwardly seen that the above rules 
generate 2" tape cells and an appropriate extension of the predicate succ/2 that 
represent consecutive cells. It is a simple matter to encode the fact that the 
first n cells of the tape contain the input, and all the following ones the blank 
symbol b. 

We can thus prove that the problem in question is ASPACE(2")-harc|l. Be- 
ing ASPACe(2") =2exptime, it immediately follows that when the arity is not 
bounded the problem is 2EXPTlME-hard. □ 

5 Complexity: Upper Bounds 

In this section we present several complexity results about query answering 
under guarded and weakly-guarded TGDs. 

5.1 Squid Decompositions 

In this section we define the notion of a squid decomposition, and prove a lemma 
called "Squid Lemma" which will be a useful tool for proving our complexity 
results in the following sub-sections. 

Definition 31. Let Q be a Boolean conjunctive query over a database schema 
1Z, where Q has n (body) atoms. An 72.-cover of Q is a Boolean conjunctive 
query Q + over TZ that contains in its body all atoms of Q and that may, in 
addition, contain at most n further IZ-atoms whose variables can be either from 
var{Q) or new ones. 

Example 2. Let TZ = {r/2, s/3, t/3}, and let Q be the Boolean conjunctive 
query \r(X,Y),r(Y,Z),t(Z,X,X)}. The following query Q+ is an ft-cover of 
Q: Q+ = {r(X, Y),r(Y, Z), t(Z, X, X), t(Y, Z, Z),s(Z, U, U)}. □ 

Lemma 32. Let D be a (finite or infinite) instance over a schema TZ and Q a 
Boolean conjunctive query over D. Then D \= Q iff there exists an TZ-cover Q + 
of Q such that D \= Q + . 

2 The notation ASPACE(/(n)) denotes the class of decision problems solved by an alternating 
Turing machine in space f(n), where n is of course the input size. An alternative notation for 
ASPACE(2 n ) is therefore AEXPSPACE 
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Proof. The only-if direction follows trivially from the fact that Q is an 7?.-cover 
of itself. The if direction follows straightforwardly from the fact that whenever 
there is a homomorphism h : var(Q + ) — > dom(D), such that h(Q + ) C D, 
then, given that Q is a subset of Q + , the restriction hi of h to var(Q) (in 
symbols, we define h! = h\ var tQ\) is a homomorphism var(Q) — > dom(D) such 
that h'(Q) = h(Q) C D. Alternatively, observe that for every covering Q + , 
since Q + has at least all atoms of Q, the containment Q+ C Q trivially holds, 
therefore from D \= Q + we straightforwardly get D \= Q. □ 

Definition 33.[f| Let Q be a Boolean conjunctive query over a schema 1Z. A 
squid decomposition 5 = (Q + ,h, H,T) of Q consists of an IZ-cover Q + of Q, a 
mapping h : var{Q + ) — > var(Q + ), and a decomposition of h(Q + ) into two sets 
H and T , with T = h{Q + ) — H , such that: (i) there exists Vs C var(Q + ) such 
that H = {a E h(Q + ) \ var(a) C V^}; (ii) T is [Vs]-acyclic. We refer to H as 
the head of S, and to T as the tentacles of S. 

One may imagine the set H in a squid decomposition as the head of a squid, 
and a the set T as a forest of tentacles attached to that head. Note that a 
squid decomposition S = (Q + , h, H,T) of Q docs not not necessarily define a 
query folding [24l [54] of Q + , because h does not need to be an endomorphism 
of Q + : in other terms, we do not require that h(Q + ) C Q + . Of course, h is a 
homomorphism. 

Example 3. Consider the following Boolean conjunctive query (the schema is 
omitted for brevity): 

Q = {r(X,Y),r(X,Z),r(Y,Z), 

r(Z, Vi),r(Vi, V 2 ),r(V 2 , V 3 ), r(V 3 , V A ), r(V 4) V 5 ), 

r (V! , V 6 ) , r(V 6 , V s ) , r (V 5 , V 7 ), r (Z, Ui) , *(E7i , U 2 , U 3 ) , 

r(U 3 ,U A ),r(U 3 ,U 5 ),r{U A ,U 5 ).} 

Let Q + be the Boolean query where we add the atom s{U 3 , U4, U5) to the body, 
that is, Q + = QU{s(U 3 , U4, U5)}. A possible squid decomposition (Q + , h, H, T) 
can be based on the homomorphism h, defined as follows: h(Ve) — V 2 , /i(T4) = 
hiVcf) = h(Vr) = V 3i and h(£) = £ for each other variable £ of Q + . The result of 
the squid decomposition with Vs = {X, Y, Z} is the query whose join graprjf) is 
shown in Figure^ where we can distinguish the (cyclic) head from the (acyclic) 
tentacles. Note that if we eliminated the additional atom s(U 3 , U4, U5), the 
original set of atoms {^(^3, U4), r(U 3 , U5), r(Ui, U$)} would form a non-[Va]- 
acyclic cycle, and therefore they could not be all part of the tentacles. □ 

The two following lemmata are auxiliary technical results. 

Lemma 34. Let Q be a Boolean CQ, and let U be a (possibly infinite) \A\-acyclic 
instance, where A C dom(U). Assume U \= Q, i.e., there is a homomorphism 
f : dom(Q) -> dom{U) with f(Q) C U. Then: 

3 This definition corrects and supersedes the one given in the conference version which was 
appropriate for relational schemata of arity 2 only. The present definition and the subsequent 
proofs work for schemes of arbitrary arities. 

4 The join graph has the query atoms as nodes, and has an arc between two atoms iff they 
share at least one argument. 



22 



r(Z, Vi) 



r(Vi,V 2 ) 



r(V 2 ,V 3 ) 



r{Z,U x ) 



s(U 1 ,U2,U 3 ) 



s(U 3 ,Ui,U 5 ) 



r(V 3 ,V 3 ) r(U 3 ,U 4 ) r(U 3 ,U 5 ) r(U 4 ,U 5 ) 



tentacles 



Figure 2: Squid decomposition from Example [3] Atoms in h{Q + ) are shown. 



(1) There is an [A]-acyclic subset W C U such that: (i) f(Q) C W, and 
(ii) |W|<2|Q|. 

(2) There is a Boolean CQ Q + such that Q + is a superset of Q and \Q + \ < 
2\Q\, and a homomorphism g such that g(Q + ) = W and g extends f , i.e., 
for each variable X of Q, we have g{X) = f{X). 

Proof. 

Part (1). By hypothesi^l, U is [^4]-acyclic and / : dom(Q) —> dom(U) with 
f(Q) Q U . Since U is [A]-acyclic, it has a (possibly infinite) [^4]-join forest 
T = (V, E, A). We assume, without loss of generality, that distinct vertices u, v 
of of T have different labels, i.e., X(u) ^ A(w). Let Tq be the finite subforest 
of T that contains all ancestors in T of nodes s such that A(s) G f(Q)- Let 
F = (V, E', A') be the forest obtained from T as follows. 

• V = {v G V I \{v) e f(Q)} U K, where K is the set of all vertices of Tq 
that have at least two children. 

• If v , w G V, then there is an edge from v to if in E' iff w is a descendant of 
v in T, and if the unique shortest path from v to w in T does not contain 
any other node from V . 

• Finally, for each v G V, X'(v) = X(v). 

Let us define W = A(V). We claim that the forest F is an [^4]-join forest of W . 
Since Condition (1) of Definition [23] ([S] -join forest) is immediately satisfied, it 
suffices to show Condition (2), that is, that F satisfies the L4]-connectcdncss 
condition. Assume for any two distinct vertices v\ and vi of F that some value 



5 One may first be tempted to let W = f(Q), but this does not work because acyclicity 
(and thus also [yl]-acyclicity) is not a hereditary property. It may well be the case that U is 
acyclic, while the subset f(Q) C U is not. Note that, however, taking W = f(Q) works in 
case of aritics at most 2. 
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b £ dom(U) — A it holds b £ dom(X' (vi)) n dom(A'(«2)). In order to prove 
the aforementioned [^-connectedness condition, we need to show that there 
exists at least one path in F between v\ and i>2 (here we consider F as a non- 
direct graph), and that every node v £ V lying on each such path is such that 
b £ dom(X'(v)). By construction of F, v\ and 1)2 are connected in T, and v lies 
on the (unique) path between v± and «2 in T; since T is an [A]-join forest, we 
have b £ dom(X(v)) = dom{X'(v)). Thus F is an [A]-join forest of W . 

Moreover, by construction of the forest F, the number of children of each 
inner vertex of F is at least 2, and F has at most \Q\ leaves. It follows that F 
has at most 2\Q\ — 1 vertices. Therefore W is an [A]-acyclic set of atoms such 
that \W\ sS 2\Q\ and W 2 f(Q). 

Part (2). Q can be extended to Q + as follows. For each atom r(ti, . . . ,tk) in 
W — f(Q), add to Q a new query atom r(£i, . . . ,£&) such that for each 1 ^ i ^ k, 
£i is a newly invented variable, that follows lexicographically all those in var(Q). 
Obviously, W (= Q + an d thus there is a homomorphism g extending /, such 
that g(Q + ) = W. Moreover, we have by construction \Q + \ < 2\Q\. □ 

Lemma 35. Let G be an [A]-acyclic instance. Let G' be an instance obtained 
from G by eliminating a set S of atoms from G where dom(S) C A. Then G' 
is [A]- acyclic. 

Proof. If T = (V, E, X) is an [A]-join forest for G, then an A-join forest T' for 
G can be straightforwardly obtained from G by repeatedly eliminating each 
vertex v from T where X(v) £ S. Since by construction each atom e eliminated 
from G is such that dom(e) C A, then for every value b £ dom(G) — A, the 
node u £ V such that X(u) = e cannot belong to the induced (connected) 
subtree {v £ V | b £ dom(X(v))} . We immediately get that G' enjoys the 
[^-connectedness property. □ 

The following Lemma will be used as a main tool in the subsequent com- 
plexity analysis. 

Lemma 36 (Squid Lemma). Let S be a weakly guarded set of TGDs on a 
schema 1Z, D a ground instance for 1Z, and Q a Boolean conjunctive query, 
then chase(D,Y,) \= Q iff there is a squid decomposition S = (Q + , h, H,T) 
and a homomorphism 9 : dom(h(Q + )) — > dom(chase(D,T,)) such that: (i) 
6{H) C chase ± (D,T,), and (ii) 0(T) C chase + (D, E). 

Proof. 

"If". If there is a squid decomposition S = (Q + , h, H, T) of Q and if there is 
a homomorphism 9 as described, then the composition 9 oh is a homomorphism 
such that (6oh){Q+) = 9{h{Q+)) C chase(D,Y,). Hence, chase(D,Y,) f= Q* , 
and by Lemma [3^1 chase(D, S) |= Q follows. 

"Only if". Assume U — chase(D,T,) |= Q. Then, there exists a homo- 
morphism / : var(Q) — > dom(U) with f(Q) C chase{D,Yi). By Lemma 1261 
chase + (D, E) is [dom(Z?)]-acyclic. By Lemma [Ml it then follows that there 
exists a Boolean query Q + with < 2\Q\ atoms, such that all atoms of Q are 
also contained in Q + , and a homomorphism g : dom(Q + ) —> dom(U) with 
g(Q + ) £ U, such that g{Q + ) is [rfom(D)]-acyclic. 

Partition var{Q + ) into two sets var ± (Q + ) and var + (Q + ) as follows: 
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• var ± (Q + ) = {x £ var(Q + ) \ g(x) £ dom(D)} ; 

• var+(Q+) = var(Q+) - var ± (Q+). 

Define a mapping h : var(Q + ) — > var(Q + ) as follows. For each x € var(Q + ), 
let h(x) be the lexicographically first variable in the set {y £ var(Q + ) \ g(y) = 
g{x)}. Let us define Vs as Vs = h(var ± (Q + )). Moreover, let H be the set 
of all those atoms a of h(Q + ) such that war (a) C Vs = h(var ± (Q + )), and 
let T = h(Q+) - H. Note that, by definition of H, g(H) C chase^D, S) 
and by definition of T, g(T) C chase + (D, E). Let 6* be the restriction of g to 
dom(h(Q + )). Clearly, 0, h, H, and T fulfill the conditions (i) and (Mp of the 
statement of this lemma. It thus remains to prove that d = (Q + ,h, H,T) is 
actually a squid decomposition of Q. For this, we only need to show that T is 
[V^J-acyclic. To prove this, first observe that 9 is, by construction, a bijection 
between h{dom{Q + )) and dom(Q{Q + )). In particular, T C h(Q + ) is isomorphic 
to 0(T) via the restriction 6 T of 9 to dom(T). Since T (T) = 9(T) is obtained 
from the [do?n(-D)]-acyclic instance 9(Q + ) by eliminating only atoms all of whose 
arguments are in dom(D) (namely the atoms in 9(H)), by Lemma 1351 9t{T) 
is itself [dom(D)]-acyclic, and therefore trivially also [dom(D) fl dom(9T(T))]- 
acyclic. Now, since for every X £ dom(T) it holds X £ Vs iff 9t(X) £ D, it 
immediately follows that, being 9t(T) [dom(D)]-acyclic, that T is [V^-acyclic. 
□ 

5.2 Clouds and the complexity of query answering under 
weakly-guarded sets of TGDs 

To study the complexity of query answering under WGTGDs, we introduce the 
notion of cloud. 

Definition 37. Let S be a weakly-guarded set of TGDs on a schema 1Z, and D 
an instance for 1Z. For every atom a of chase(D, E) the cloud of a with respect 
to E and D, denoted cloud(D 1 T, 1 a), is the set of all atoms in chase(D,T,) 
whose arguments are in dom(a) U dom(D). More formally, cloud(D,H,a) = 
{b £ chase(D,Yi) \ dom(b) C dom(g) U dom(D)}. Notice that for every atom 
a £ chase(D,Ti) we have D C cloud(D, E, a). Moreover, we define 

clouds(D,Yi) = {cloud(D,Yi,a) a £ chase(D,Y,)} 
clouds + (D , E) = {(a, cloud(D, E, a)) | a £ chase(D, E)} 

A set S C cloud(D,Yi 1 g?) is called a subcloud of a (with respect to E and D). 
The set of all subclouds of an atom a is denoted by subclouds(D, E, a). Finally, 
we define subclouds + (D ,E) = {(a,C) | a G chase(D,Y,) AC C cloud(D, E, a)}. 

Definition 38. Let 13 6e an instance for a schema TZ. Let a and (3 be two 

constructs consisting each of one atom of HB(D), or a set of atoms of HB(D) , 
or an atom paired with a set of atoms of HB(D). We say that a and j3 are D- 
isomorphic, denoted a ~£> j3, or simply a ~ j3 in case D is understood, iff there 
exists a bijection (i.e., a bijective homomorphisn^) f : dom(a) dom(0) such 
that /(a) = P. 

6 We remind that, by definition, the restriction of a homomorphism to dom(D) is the 
identity homomorphism. 



25 



Example 4. If a, b G dom(D) and £iiC2,C3?C4 ^ dom(D), we 
have: ^(0,(1,(2) ^ p(a,(3,(i) and (p(a, C3), {g(o, C3), qfa, C3), KCs)}) - 
(p(a,Ci)j{g(a>Ci),?(a,Ci),KCi)})- Differently, p(o,Ci,C2) ^ K a >Ci,Ci) and 
p(o,Ci,Ca)9 4 P(Cs,Ci,Ci)- □ 

Lemma 39. Given an instance D for a schema 7Z, the D -isomorphism relation 
~ is an equivalence relation. 

The above lemma, whose proof we omit, allows us to define quotient sets of 
atoms with values in dom(D) U An U Ay. The following Lemma follows rather 
straightforwardly from Definition 1371 

Lemma 40. Let E be a weakly guarded set of TGDs, and D an instance for a 
schema TZ; also, let \7Z\ be the number of predicate symbols in TZ, and w be the 
maximum arity of a symbol in 1Z. The following claims hold. 

(1) For every atom a G chase(D, E), 

\cloud(D,E,a)\ < \K\ ■ (\dom(D)\ + w) w 

hence cloud(D, E,o) is polynomial in size in case the arity w is fixed, and 
exponential otherwise (assuming that \dom(D)\ 2). 

(2) For each atom a G chase{D,Yi), \subclouds(D,T,,a)\ ^ 

2\K\-(\dom(D)\+w) w 

(3) \clouds(D,Y,)/~\ ^ 2\ n \^ dom ^\ +wS > w , i.e., there are at most exponen- 
tially many possible clouds or subclouds in total in a chase in case the 
arity w is fixed. Similarity , we get: 

(4) \clouds + {D,Y)/~\ < \subclouds + (D, E)/~| ^ \1Z\ ■ (\dom(D)\ + w) w ■ 

2\H\-(\dom(D)\+w)™ 

Proof. The four claims are proved by combinatorial arguments as follows. 

(1) All possible distinct atoms in a cloud are obtained by placing the sym- 
bols of a, plus possibly symbols from dom(D), in at most w arguments 
of some predicate (relation) symbol in 1Z. The number of symbols to be 
placed is evidently dom(D) + w, choosing among \1Z\ relational predicates, 
hence the claim. 

(2) The number of ways we can choose subclouds (I?, E, a) determines, as 
it is immediately seen, the set of all subsets of cloud(D, E,a), hence the 
claim. 

(3) It is straightforwardly seen that the maximum set of all non-pairwisc- 
isomorphic clouds in the chase has a number of elements that is bounded 
by the number of possible subclouds of a fixed atom; this holds because, 
since labeled nulls play the role of existentially quantified variables, what 
counts here is how the (at most w) nulls are placed in the atoms of the 
cloud together with the values of dom(D). From this observation, the 
claim immediately follows. 
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(4) Here, we are counting the number of all possible subclouds, each asso- 
ciated with its "generating" atom. The inequality holds because, once we 
choose all non-pairwise-isomorphic clouds, each of their possible generat- 
ing atoms can have as arguments only among the \dom{D)\ + w symbols 
with which we construct the subclouds. 

□ 

Definition 41. Let a G chase(D, E). Then, we define the following: 

• a denotes the set of all atoms that are nodes of the subtree o/gcf (£>,£) 
that is rooted in a; 

• Va =a Ucloud(D, E, a); 

• if S is a set of atoms in gcf(-D,E), then gcf [a, S] is inductively defined 
as follows (D and E are implict here, so as to avoid notational clutter): 

(i) S U {a} C gcf[a, S]; (ii) if b Ga, and b is obtained via the chase rule 
applied to a TGD of the form $ — » \P via some some homomorphism 
such that 0(ty) = b (we remind that we are assuming one-atom heads in 
TGDs) and (9($) C gcf [a, S], then b 6 gcf [a, S}. 

Observe that, intuititively, gcf [a, S] is the set of atoms in subtree of gcf (D, E) 
having a as root and generated by using the set S of atoms as (sub)cloud of 
a to start. The central importance of clouds in the context of weakly guarded 
TGDs is that if a is an atom of a generalized chase tree gcf (D, E), then Va is 
determined by cloud(D, E, a) (modulo, of course, renaming of labeled nulls). 

Theorem 42. If D is an instance for a schema TZ, E a weakly guarded set of 
TGDs, and a G chase{D, E) ; then Va = gcf [a, cloud{D, E,a)]. 

Proof. From the definition of Va and of gcf [a, cloud(D, E, a)], it follows imme- 
diately that gcf [a, cloud(D, E, a)] C Va. It thus remains to show the converse 
inclusion Va C gcf [a, cloud(D, E, a)]. Define levela_(a) = and for each fact in 

b G cloud(D, E,a) — Va, level a (6) = 0, while for every other atom c Ga, level a (c) 
is the distance (i.e., the length of the path) from a to c in gcf(D, E). 

We first show the following facts in parallel by induction on level a (6), for 
every atom b G Vacloud(D, E,a): 

(1) If b G Va then cloud(D,T,,b) C gcf [a, cloud(D, E, a)]. 

(2) If b G Va then b G gcf [a, cloud(D, E, a)]. 

The latter statement is the claim we are to prove. 

Induction basis. In case levela(b) = 0, we have either (a) 6 G cloud(D, E, a) — 
{a}, or (b) b = a. In the case (a), cloud(D,T,,a) C gcf [a, cloud(D, E, a)] 
and therefore 6 G gcf[a, cloud(D, E, a)], which proves (%). Moreover, since 
6 G cloud(D,~E,a), we have that b cannot have more labeled nulls as ar- 
guments than a, that is dom(b) — dom(D) C dom(a) — dom(D). there- 
fore cloud{D,Yi,b) C cloud{D,Y,,a) C gcf [a, cloud(D, E, a)], which proves 
(j8j. Let us then consider the case f&j b = a. We have cloud(D,'E,a) = 
cloud(D,T,,b) C gcf [a, cloud(D, E, a)], which proves f-Zj. Moreover, = a G 
gcf [a, cloud(D, E, a)], which proves 



27 



Induction step. Assume that (1) and (2) are satisfied for all c £ Va such 
that levela(c) ^ i. Assume levela® = i + 1, with i ^ 0. 6 is produced by 
a TGD whose guard g matches some atom b~ having level i, which is, by the 
induction hypothesis, in gcf [a, cloud (D,Ti, a)]. The body atoms of such TGD 
then match atoms whose arguments need to be in cloud(D, E,6), and thus also 
in gcf [a, cloud(D, E, a)], again by the induction hypothesis. Therefore, (2) holds 
for b. To show (1), consider an atom b £ cloud (D, E, 6). In case dom(b) C 
dom(bT), we have cloud(D,T,,b) C cloud (D,Y},b~) C gcf [a, cloud(D, E, a)]. 
Otherwise, 6' has as argument (s) at least one new labeled null that was intro- 
duced during the generation of &. Given that E is a weakly guarded set, and 
each labeled null in An is introduced only once in the chase, there must be a 
chain from b to b' in gcf (D, E) (and therefore in V6). A simple, further induc- 
tion argument on levehQ/) shows that all applications of TGDs in that chain 
must have been fired on elements of gcf [a, cloud{D, E, a)] only. Therefore, in 
particular, b £ gcf [a, cloud(D, E, a)]. This proves (1). □ 

From the above theorem, we easily obtain the following result. 

Corollary 43. If D is an instance for a schema TZ, E a weakly guarded set 
of TGDs, a,b £ chase(D,Y^), and (a, cloud{D,Y,,a)) ~ (b, cloud (D,H,b)), then 
Va ~ V6. 

Definition 44. Let a be an atom. The canonical renaming can a : dom(a) U 
dom(D) — > A a U dom(D), where A a is an ad-hoc set ■ • ■ , °f labeled 
nulls in An, not appearing in a, is a substitution that maps each element of 
dom(D) into itself and maps the i-th argument value in lexicographic order of 
a which is not in dom(D) to £j, for all i such that 1 ^ i ^ h, where h is the 
number of values in a that are not in dom(D). If S C cloud(D,T,,a) (i.e., if 
S £ subclouds(D, E, a) ), then can a (S) is well-defined and we denote by can{a, S) 
the pair (cana(a), cana{S)). 

Example 5. If a = g(d, £i, (2> Ci) where d £ dom(D) and {£i, (2} H dom(D) = 
0, and if S = {p(£i), rfa, C2), s(Ci, C27 b)}, where b £ dom(D), then canc{c) = 
s(gUi,6,£i), and can.(S) = {pfa), r(f 2 , 6), 6, &)}• □ 

Definition 45. //-D ?s an instance for a schema TZ, T, is a weakly guarded set 
of TGDs on TZ, S is a set of atoms and a £ S, then we write (D, E,a, S) \= Q 

iff there exists a homomorphism 9 such that 9{Q) C SU a. 

The following result straightforwardly follows from from Theorem 1421 from 
our previous definitions, and from a few additional considerations. 

Corollary 46. If D is an instance for a schema TZ, E a weakly guarded set 
of TGDs, a £ chase{D,Yi), and Q is a Boolean conjunctive query, then the 
following statements are equivalent: 

(1) Va\=Q 

(2) (D, E, a, cloud(D, E,a)) ^ <3 

(3) (-D, E, cana{a),can gL (cloud(D,'E,a))) 

(4) there exists a subset S' C cloud (D, E, a) such that 
(D, E, ca«a(a), can^S')) \= Q. 
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Towards the aim of designing an alternating algorithm that computes the 
relevant parts of chase (D,E) necessary to answer a query Q, we can thus re- 
conduct any subcomputation of gcf[a, S], with S C cloud(D 1 Yi, a), which we 
indicate with the pair (a, S) , to its canonical form can a (a, S) . Note also that 
each pair can a (a, cloud(D 1 E, a)) can be seen as the unique canonical represen- 
tative of the equivalence class 

{(&, c/oud(D,S,6)) | (&, cloud (D,Z,b)) ~ (a, cloud(D, E, a))} 

of (a, cloud(D, E, a)) in clouds + (D 1 E). Therefore, the two sets 

setcan{a 1 cloud{D,Yi 1 a)) \ a G chase(D,Y,) 

and the quotient set clouds + (£>,£)/ ~ are isomorphic. Note that by Lemma HOI 
these sets are finite and of size exponential in | D | + | S | in case of a fixed schema 
(otherwise, double exponential). 

Now, given a database D for a schema 7?., a weakly guarded set of TGDs E 
on 1Z, and an atomic Boolean conjunctive query Q, we describe an alternating 
algorithm Acheck(Z), E, Q) that decides whether DUE |= Q, or, equivalently, 
.DUE |= Q. We assume the query Q to be of the form 3Yi, . . . , Yg, p(ti,t 2 , ■ ■ ■ ,t r ) 
where p is a predicate symbol, and the t\,...,t r , with r ^ are terms (constants 
or variables) in dom(D) U {Yy, Y2, . . . , Yg }. Our description of Acheck will be 
somehow high-level, but complete. 

The algorithm uses as basic data structures (configurations) tuples of the 
form (a, S, S + , -<,b). Each such tuple corresponds to a vertex of the chase-tree 
at some moment in time. The informal meaning of the parameters is as follows: 

(1) a is the root atom of the chase (sub)tree under consideration; 

(2) S is the set cloud(D,'E,a) or a subset thereof; 

(3) S' is the subset of cloud(D, S,a) that has been established so far; 

(4) -< is a total ordering of the atoms in S that corresponds to the sequential 
ordering in which the atoms of S are actually derived in chase(D, E); 

(5) & is an atom that needs to be derived. In some special cases (namely, on 
the "main" path in the proof tree developed by Acheck), the algorithm 
will not try to derive a specific atom, but will just try to match the input 
query atom p(t\, . . . , t r ) against the atoms of that path; in that case, we 
shall use the symbol * in place of b. 

We are now ready for describing the algorithm Acheck. The algorithm first 
checks whether D \= Q already If so, Acheck returns "true" and halts. Other- 
wise, the algorithm attempts to guess a path that contains an atom q' that is 
an instance of Q. 

Initialization. We first explain the initialization. The algorithm Acheck 
starts at D and guesses some atom a of D, to be expanded into a main branch 
that will eventually lead to an atom q' matching Q. To this aim, the algorithm 
also guesses a set S C cloud(D 7 E, a) and a total order -< on S, and generates a 
configuration (a, S, 5", -<, *). 

General notions. Assume the set S is given by S U S + , where S + = 
{a 1; a 2 , . . . , a*;}, where S 1 - C D, and for each 1 i k, a, ^ D. The total 
order -< is such that all elements of S 1 - precede those of S + . Assume that -< 
on S + is defined as a x ~< a 2 -<,■■• -< a -<•■•-< a k . To prove that S is actually 
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a subset of cloud(D,ll,a), it is necessary for Acheck to prove that each of the 
atoms ,a k is indeed an atom of chase(D, E), where the proof for each 

atom cij may assume as premises only atoms of S that precede it according 
to -< . The algorithm thus finds atoms , . . . , d k in D to be expanded to proof 
trees for a x , . . . , a fc , respectively. For each 1 ^ i ^ k, it generates a configuration 
(e^, S, Sd U {c^, a 2 , . . . , Q i _ 1 },a i ). Intuitively, each one of such configurations 
requires to prove a Li assuming that a 11 . . . , a i _ 1 have all been proved. Acheck 
thus simulates the sequential proof of all atoms in cloud(D, £,a) of the original 
chase via a parallel universal branching. 

Expansion — existential branching. We now explain how the configu- 
ration tree is expanded at each configuration c. Let c = (a, S, S', -<, 6), where 
S = {a,i , a 2 , ■ ■ ■ , a k } , S' = {g L1 , a 2 , . . . , a, } , and -< is is given by a x ~< a 2 -< • • • -< 
a -< • • ■ -< a fe . If 6 6 D, then Acheck accepts this configuration, and does not fur- 
ther expand it. If b = *, then Acheck checks (via a simple existential subroutine) 
whether Q matches a, i.e., if a is the homomorphic image of the input query 
atom p(ti, . . . ,t r ). If so, Acheck accepts c and does not further expand it. In 
case by^-k, Acheck checks whether b — a. If so, Acheck accepts the configuration 
c and does not further expand it. Otherwise, the configuration tree is expanded 
as follows. Acheck guesses a TGD p G S, where p = <I> — s- 'J, whose guard g 
matches a via some substitution 9 (that is, 8(g) — a) and such that: C 5', 

and 6'(\E') is the new atom generated (possibly containing some fresh labeled 
nulls in An). Before creating the actual new configuration c\ from c, let us 
present, for the sake of better intelligibility, an intermediate new configuration 
C\. We have c\ = (g^, Si, S[, -<i, b^, where: 

(a) Si = 0(^) is the new atom generated by the application of p with substi- 
tution 9 (we recall that all our TGDs have a single atom in their head). 

(b) Si contains g L1 and each atom d of S such that dom(d) C dom(g L1 ) U 
dom(D). Thus, in addition to the new atom a 1; Si inherits all atoms 
that were in the subcloud S of the parent configuration c, and that, more- 
over, are compatible with the vocabulary of a 1 . In addition, Si, which 
intuitively represents the cloud or a subcloud of a 1; may contain a set 
newatoms(ci) of further atoms that are guessed by the Acheck algorithm, 
and which must each contain at least one labeled null of g L1 , or one domain 
element c G D that does not occur in S (otherwise they could not be new 
w.r.t. S; of course they cannot have as arguments other nulls than those 
in ai). 

(c) S'x is such that S[ = S\. Intuitively, c\ represents the "main" descendant 
of c, where we assume that already all atoms of the guessed subcloud S 
have been proved. As described later on, Acheck will have to generate in 
parallel further configurations, that actually prove the atoms of the set 
newatoms(ci). 

(d) -<i is a total order on S[ , obtained from -< by eliminating all atoms d such 
that dom(d) % dom(a 1 ) U dom(D) (that is, all atoms in S — Si), and by 
placing the atoms of newatoms(ci) at nondeterministically chosen places, 
but following all atoms from the set oldproved(ci), which consists of those 
atoms of S[ that also occur in S' (and that were thus already assumed to 
be proved at the parent configuration c). 

(e) &i is defined as 6 X = b. 

Importantly, instead of generating the above-described configuration c\ — 
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(a 1; Si, S[, -<i, 61), the Acheck algorithm actually generates the following con- 
figuration C2, which is the canonical form c-2 = can a (ci), that is: 

c 2 = (can^fe), can a 1 (Si), can^S^), can^Hi), can^fe)) 

where can a (-<i) is the total order on the atoms of can a (S^), derived from -<i. 

Expansion — universal branching. After the previous expansion, the 
algorithm will furthermore generate in parallel, and in a universal computa- 
tion branching, a set of auxiliary descendant configurations of ci for proving 
that all the guessed atoms in can &1 (newatoms(ci)) are actually derivable. Let 
carlo (newatoms(ci)) = {zii, ■ • ■ ,R m }> an d ^ the linear order ^1 of the set Si 
of ci be a concatenation of the order -<, restricted to old proved (ci), and the 
ordered list n x <i n 2 <i ■ ■ ■ -<i n m . For each 1 < i < k, Acheck generates 
a configuration cf 1 which is the canonical form (w.r.t. c^) of an intermediate 
configuration C3 , that is c^ = can^c^ 1 ), where 

4 = (a l5 S'i,oldproved(ci) U {n 1; . . . ,n i _ 1 } ) -^nj 
This completes the description of the Acheck algorithm. 

Theorem 47. The Acheck algorithm is correct and runs in exponential time in 
case of bounded arities, and in double exponential time otherwise. 

Proof. 

Soundness. It is easy to see that the algorithm is sound with respect to the 
standard chase, i.e., if Acheck(E, D, Q) returns "true", then chase(D,Y,) \= 
Q. In fact, the algorithm performs, modulo variable rcnamings which preserve 
soundness according to CorollaryHH essentially nothing but chase steps starting 
from D and E, even though not necessarily in the same order as the standard 
chase. Thus, each atom derived by Acheck occurs in some chase. Since every 
chase computes a universal solution that is complete with respect to conjunctive 
query answering, whenever Acheck returns true, Q is satisfied by some chase, 
and thus also by the standard chase chase(D, E). 

Completeness. The completeness of Acheck with respect to chase(D,T,) 
can be seen as follows. Whenever chase(D, E) |= Q, there is a finite proof of Q, 
i.e., a finite sequence £ of generated atoms that ends with some atom q' which is 
an instance of Q. This proof can be simulated by the alternating computation 
Acheck by using the following guidelines: (i) steer the main branch of Acheck 
towards (a variant of) q' by choosing successively the same TGDs and substitu- 
tions 9 (modulo the appropriate variable renamings) as those used in the stan- 
dard chase for the branch of a'; (ii) whenever a subcloud S has to be chosen for 
some atom a by Acheck, choose the set of atoms cloud(D, T,,a)n(D\j(atoms(^)), 
modulo appropriate variable renaming; (Hi) for the ordering -<, always choose 
the one given by £. The fact that no Q-instance is lost when replacing configura- 
tions Ci by their canonical versions c' = cangXci) is guaranteed by Corollary 051 

Computational cost. In case of fixed arity, the size of each configuration 
c is polynomial in E U D. Thus, Acheck describes an alternating PSPACE (i.e., 
APSPACe) computation. It is well-known that APSPACE = exptime. In case 
the arity is variable, each configuration requires at most exponential space; the 
algorithm then describes a computation in Alternating expspace, which is equal 

to 2EXPTIME. □ 
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Corollary 48. Let £ be a weakly guarded set of TGDs, and let D be an instance 
for a schema 1Z. Then, computing chase 1 (D , S) can be done in exponential time 
in case of bounded arity, and in double exponential time otherwise. 

Proof. It is sufficient to start with an empty set A and then cycle over each 
possible ground atom q of the Hcrbrand base HB(D), and check whether 
chase(D,Y}) \= q, and if so, add it to A. The result is chase 1 (D,T,). The 
claimed time bounds follow straightforwardly. □ 

We now show that answering general (Boolean) conjunctive queries is of the 
same complexity. To this aim, we will use squid decompositions as previously 
defined. 

Theorem 49. Let £ be a weakly guarded set of TGDs, D an instance for a 
schema 1Z, and Q a Boolean conjunctive query. The problem of determining 
whether DUT, \= Q, or, equivalently, whether chase(D,T,) \= Q, is in EXPTIME 
in case of bounded arity, and in 2exptime in general. 

Proof. We construct a nondctcrministic algorithm Qcheck such that 
Qcheck(Z?, S, Q) outputs "true" iff D U £ |= Q, or, equivalently, iff 
chase(D,T.) \= Q. The algorithm heavily relies on the notion of squid de- 
compositions, and on Lemma 1551 Qcheck works as follows. 

(1) Qcheck computes chase ± (D, £). 

(2) Qcheck nondetcrministically guesses a squid decomposition S = 
(Q + ,h,H,T) of Q based on a set V$ Q var(h(Q + )), where H = {a 6 
h(Q + ) | var(a) C Vs}, where T is [V^J-acyclic, and Qcheck also guesses a 
substitution O : Vg — >• dom(D) such that 9q(H) C chase J ~(D, S). Note 
that this is an np guess, because the size of Q + is at most twice the size 
of Q. 

(3) Qcheck checks whether 9o can be extended to a homomorphism 9 such 
that 9{T) C chase + {D ,Tj) . Note that by Lemma I3T>1 this is equivalent to 
chase{D,Yi) \= Q. Such a 9 exists iff for each connected subgraph t of 
9q(T), there is a homomorphism 9 t such that 9t(t) C chase + (D, £). The 
Qcheck algorithm thus identifies the connected components of 9q(T). Each 
such component is a [dom(-D)]-acyclic conjunctive query, some of whose 
arguments may contain constants from dom(D). Each such component 
can thus be represented in form of a [dom(D)]-join tree t. For each such 
join tree t, Qcheck now tests whether there exists a homomorphism 9t such 
that 9t(t) C chase + {D ,Tj) . This is done by the subroutine Tcheck, that 
takes as arguments the TGDs, the database instance, and a connected 
subgraph t of 9q(T); how Tcheck(Z?, £,£) is executed is described below. 

(4) Qcheck outputs "true" iff the above check (3) (which relies on its sub- 
checks on the [dom(D)]-]om trees) has a positive result. 

The correctness of Qcheck follows from Lemma [551 Given that step (2) is 
nondeterministic, the complexity of Qcheck is in NP X , i.e., NP with an oracle in 
X, where X is a complexity class that is sufficiently powerful for: (i) computing 
chase ± (D, E), and (ii) performing the tests Tcheck(£>, E,t). 
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We now describe the Tcheck subroutine. 

General notions. Tcheck(Z), S, t) can be obtained from Acheck via the 

following modifications. In addition to the data structures carried by each con- 
figuration of Acheck, each configuration of Tcheck also maintains an array subst 
of length w, where w is the maximum predicate arity in 1Z. Each array element of 
subst describes a substitution that replaces some element X £ dom{t) — dom(D) 
of t by some element from {Xi, X2, ■ ■ ■}, where the Xj are the new "canonical" 
elements dynamically generated by Tcheck (see the description of Acheck, where 
the generation of the canonical elements is done in the same way). Moreover, 
each configuration of Tcheck maintains a pointer Tpoint to a vertex (i.e., atom) 
of t, which informally points to the root of the subtree of t that still needs to 
be matched by descendant configurations of c. 

Tcheck works like Acheck, but instead of nondeterministically constructing 
a main configuration path of the configuration tree such that eventually some 
atom matches the unique query atom, Tcheck nondeterministically constructs 
a main configuration (sub)tree r of the configuration tree, such that eventually 
all atoms of the join tree t will be consistently translated into some vertices of r. 
An important component of each main configuration c of Tcheck is its current 
atom a. Initially, the atom a is some nondeterministically chosen atom of D. 
For deeper main configurations of the alternating computation tree, a will take 
on labels of nodes of gcf(D, £). 

Initialization. The computation, similarly to Acheck, starts by generating 
initially a configuration (a, S, S' , -<,★, Tpoint, subst), where a is, as said, nonde- 
terministically chosen from the database D, and where Tpoint points to the root 
r of t, and subst encodes a homomorphism fj, such that /Lt(r) = a, if n exists, 
and n is the empty substitution otherwise. This configuration will now be the 
root of the main configuration tree. Universal branching. In addition, just as 
in Acheck, Tcheck generates further configurations, in a universal expansion, 
whose task is to check if all elements of S are indeed provable. 

In general, the pointer Tpoint of each main configuration c points to some 
atom a q of t, which has not yet been matched. The algorithm attempts to 
expand this configuration by successively guessing a subtree of configurations, 
mimicking a suitable subtree of gef (D, S) that satisfies the subquery of t rooted 
at a q . 

Expansion. More precisely, the expansion of a main configuration c = 
(a, S, S', -<, *, Tpoint, subst) works as follows. For a configuration c, Tcheck first 
checks whether there exists a homomorphism fi such that /x(subst(a 9 )) = a. 

1. (fi exists.) If )jl exists, we have two cases: 

1.1. If a q is a leaf of t, then the current configuration turns into an ac- 
cepting one. 

1.2. If a q is not a leaf of t, then Tcheck nondeterministically decides 
whether // (encoded by subst) is a good match, i.e., one that con- 
tributes to a global query answer and can be expanded to map the 
entire tree t into gef (D, E). 

1.2.1. (Good match). In case of a good match, Tcheck nondeterminis- 
tically generates for each child a ql of a ? in t a new configuration 

Ci = can^ (ttj, Si, S[, -<i, *, Tpoint l5 substi) 
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where Tpointj points to a ql , and where substi encodes can Ui (pi). 
The atom g L1 is guessed, analogously to what is done in Acheck, 
by guessing some TGD p E £ of the form <E> — > Psi such that 
the guard atom g matches a via some homomorphism 9 (that 
is, 6(g) = a) such that C S. The cloud subsets Si and S[ 
are chosen again as in Acheck. Intuitively, here Tcheck, having 
found a good match of a q on a, tries to match the children of a q 
in t to children (and, eventually, descendants) of a in gcf(D, S). 
Universal branching. Of course, Tcheck, just as Acheck, gener- 
ates, in addition, auxiliary configurations in order to prove that 
all atoms of Si are actually derivable. 
1.2.2. (No good match). In case no good match exists, a child configu- 
ration 

c 2 = (a 2 , S 7 5', -<,★, Tpoint, subst) 

of c is nondeterministically created, whose first component is a 
child a 2 °f <L an d where c 2 inherits all of its remaining com- 
ponents from c. Intuitively, after having failed at matching a q 
(to which, we remind, Tpoint points) to a, Tcheck attempts at 
matching the same to some child of a in gef (D, E). 

2. (fi docs not exist.) We again have two cases. 

2.1. If a ? is a leaf of t, the configuration is rejecting. 

2.2. If a q is not a leaf of t, then Tcheck performs an existential branching 

Correctness. The correctness of Tcheck can be shown along similar lines 
as the one of Acheck. An important additional point to consider for Tcheck is 
that, given that the query t is acyclic, it is actually sufficient to remember at 
each configuration c only the latest "atom" substitution subst. The correctness 
of Qcheck follows, as said from the correctness of Tcheck and from Lemma |3T>1 

Computational cost. As for the complexity of Qcheck, note that in 
case the arity is bounded, Tcheck runs in APSPACE = exptime, and com- 
puting chase ± (D, E) is in exptime by Corollary l48l Thus, Qcheck runs in 
time np exptime = exptime. In case of unbounded arities, both computing 
chase (D,Ti) and running Tcheck are in 2exptime, therefore Qcheck runs in 
time np 2exptime = 2exptime. □ 

By combining Theorem[3D]and Theorem|3ni we immediately get the following 
complexity characterization for reasoning under weakly guarded sets of TGDs. 

Theorem 50. Let E be a weakly guarded set of TGDs on a schema 1Z, D an 
instance for 1Z, and Q be a Boolean conjunctive query. Determining whether 
DUT. \= Q, or, equivalently, whether chase(D,T,) j= Q is EXPTiME-complete in 
case of bounded predicate arities, and even in case E is fixed; it is 2exptime 
complete in general. The same completeness results hold for the problem of 
query containment under weakly guarded sets of TGDs. 
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6 Guarded TGDs 



Let us now turn our attention to GTGDs. 

Theorem 51. Let E be a set of GTGDs over a schema 1Z, and let D be an 

instance for 7Z. Let, moreover w denote the maximum predicate arity in 7Z, and 
let \1Z\ denote the total number of predicate symbols. Then: 

(1) Computing chase^iY., D) can be done in polynomial time if both w and 
\1Z\ are bounded, and thus also in case of a fixed set E. This problem is 
in exptime in case w is bounded, and in 2exptime otherwise. 

(2) If Q is an atomic Boolean query, then checking whether chase(T,,D) \= 
Q is PTIME- complete in case both w and \1Z\ are bounded, and remains 
PTIME- complete even in case E is fixed. This problem is EXPTIME- 
complete if w is bounded and 2EXPTIME- complete in general. It remains 
2EXPTlME-complete even when \1Z\ is bounded. 

(3) If Q is a general conjunctive query, checking whether chase(T,, D) \= Q 
is NP-complete in case both w and \1Z\ are bounded, and thus also in case of 
a fixed set E. Checking whether chase(E, D) \= Q is exptime- complete if 
w is bounded and 2exptime- complete in general. It remains 2exptime- 
complete even when \1Z\ is bounded. 

(4) Query containment under GTGDs is NP-complete if both w and \1Z\ 
are bounded, and even in case the set E of GTGDs is fixed. 

(5) Query containment under GTGDs is EXPTIME- complete ifw is bounded 
and 2EXPTIME- complete in general. It remains 2EXPTIME- complete even 
when \TZ\ is bounded. 

Proof. The PTiME-hardness of checking chase(Y,,D) \= Q for atomic queries Q 
and for fixed E follows from the fact that factual inference in fully guarded 
Datalog programs is PTiME-hard. In fact, in the proof of Theorem 4.4 of [55] 
it is shown that fact inference from a single-rule Datalog program whose body 
contains a guard atom that contains all variables is PTiME-hard. 

The np- hardness in items (3) and (4) is immediately derived from the hard- 
ness of containment (which in turn is polynomially equivalent to query answer- 
ing) without constraints [2"4"] . 

The hardness results for exptime and 2 exptime are all derived by minor 
variants of Theorem 1301 However, in case \7Z\ is unbounded and w is bounded, 
the tape cells of the polynomial worktapc will be simulated by using polynomi- 
ally many predicate symbols. For example, the fact that in configuration v cell 
5 contains a 1 may be encoded as S^(v). The details of the proof are omitted. 

The membership results are proved exactly as those for weakly guarded sets 
of TGDs, except that instead of using the concept of cloud, we use the similar 
concept of restricted cloud. The restricted cloud rcloud(D, S, a) of an atom a G 
chase(D, S) is the set of all atoms b S chase{D, E) such that dom(b) C dom(a). 
By a proof that is almost identical to the one of Theorem [42] we can show that 
if D is an instance, E a set of GTGDs, and if a € chase(D,Y,), then V r a = 

gcf[a, rcloud(D, E, a)], where V r a is defined as Va = {a} U rcloud{D, E, a). It 
follows that, for the main computational tasks, we can use algorithms rAcheck, 
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rQcheck, and rTcheck that differ from the respective original algorithms only in 
that restricted clouds instead of clouds are used. However, while in case both 
\TZ\ and w are bounded, a cloud (or subcloud) can still be of polynomial size 
in \D U S|, a restricted cloud rcloud(E, D,a) has a constant number of atoms, 
and storing its canonical version can a (rcloud(T,, D,a)) thus requires logarithmic 
space only. In total, in case both \TZ\ and w are bounded, due to the use of 
restricted clouds (and subsets thereof) each configuration c of rAcheck and of 
rTcheck only requires logarithmic space. Since ALOGSPACE = ptime, the results 
for items (1) and (2) for the case both w and \TZ\ are bounded follow. The 
rQcheck algorithm then describes a computation in np ptime = np, and hence 
also deciding if chase(D,T,) \= Q is in np. Item (3) follows immediately from 
Item (2) and Corollary [TO] □ 

Note that one of the main results of Johnson and Klug [35] , namely, that 
query containment under inclusion dependencies of bounded arities is np- 
complete, is a special case of Item (3) of Theorem I5T1 

6.1 Tighter Complexity Bounds 

The next result, which tightens parts of Theorem [5lJ shows that the above 
exptime and 2EXPTiME-completeness results hold even in case of a fixed input 
database. 

Theorem 52. Let £ be set of GTGDs on a schema 71. for TZ. Let w denote 
the maximum arity of any predicate appearing in 1Z, and let \7Z\ denote the 
total number of predicate symbols. Then, for fixed databases D and for both 
fixed or variable queries Q, checking whether chase{D,Yi) (= Q is exptime- 
complete if w is bounded and 2EXPTlME-complete in general. This problem 
remains 2exptime- complete even when \1Z\ is bounded. 

Proof. First, let us note that the upper bounds (i.e., the membership results for 
exptime and 2-exptime are obviously inherited from Theorem [5TJ and that 
it suffices to prove the hardness results for the cases where Q is a fixed atomic 
query 

Let us start to prove that checking chase(E, D) j= Q is EXPTiME-hard if w is 
bounded. It is well-known that APSPACE (alternating pspace) equals EXPTIME. 
Notice that alternating linspace is already EXPTiME-hard, so to prove our claim 
it suffices to simulate the behavior of an Alternating Turing Machine (ATM) 
M. on an input / (that will be a bit-string). In particular, we will show that M. 
accepts the input / iff chase(T,, D) \= Q. 

Without loss of generality, we assume that (i) ATM M. has exactly one 
accepting state, a, which is also a halting state; (ii) the initial state of M. is an 
existential state; (Hi) M. alternates at each transition between existential and 
universal states; and (iv) A4 never tries to read beyond its tape boundaries. 

Let M be defined as 

M = {S,A,8,q ,a) 

ewhere S is the set of states, A is the tape alphabet (assumed to be {0, 1, b}), 
b G r is the blank tape symbol, 5 is the transition function, defined as 
i : S x yl 4 (S x 4 x {L,R,±}) 2 (_L denotes the "stay" head move, while 
L and R denote "left" and "right" respectively), qo £ S is the initial state, and 
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a is, as said, the accepting state. Being M an alternating TM, the set of states 
S is partitioned into two sets Sy and Sa (universal and existential states, respec- 
tively) . The general idea of the encoding is that the different configurations of 
M. on input / of length n will be represented by fresh nulls that are generated 
in the construction of the chase. 

Let us now describe the signature 7Z. First, the signature contains for each 
integer 1 ^ i ^ n the predicate headi/1, such that headi(c) be true iff at con- 
figuration c the head of M. is over tape cell i. We shall also have the predicates 
zeroi/l, onei/1, and blanki/1, where zeroi(c), one,(c) and blanki(c) are true if 
in configuration c the tape cell i contains the symbol zero, one, or blank, respec- 
tively. Moreover, 1Z contains for each state s £ S a predicate state s /l, such that 
state s (c) is true iff the state of configuration c is s. 1Z also contains a predicate 
start /l, with start (x) evaluating to true only if c is the starting configuration; it 
will also contain a predicate config, which is true for every configuration given 
as argument, and a predicate next/3, where next(c, ci, c 2 ) is true if c\ and c 2 
are the two successor configurations of c. There are also predicates universal / 1 
and existential /l, such that universale) and existential (c) are true if c is a 
universal (respectively, existential) configuration. Finally, there is a predicate 
accept /l, such that accept (c) will be true only on accepting configurations c, 
and a null-ary (i.e., propositional) predicate accept which will become true iff 
the Turing Machine M. accepts the input /. 

We now describe a set E(./V(,/) of GTGDs that simulates the behaviour of 
M. on input /. The rules of E(.M,/) are as follows. 

1. Initial state generation rules. The following rule creates an initial state: 
—> 3X init(X). We also add a rule init(X) — > config(X), stating that the 
initial configuration is actually a configuration. 

2. Initial configuration rules. The following set of rules encode the tape 
content of the initial configuration. For each 1 ^ i ^ n, if the i-th bit of 
the input / is zero, then we add the rule init(X) — > zeroi(X); if it is one, we 
add init(X) — > onei(X), and if it is blank, we add init(X) — > blanki(X). 
We also add the rule init(X) — > existential (X) in order to say, without loss 
of generality, that the initial configuration is an existential one. Moreover, 
we add the rules init(X) — > headi(X) and init(X) — > state SQ {X) for 
defining the initial values of the state and the head position of A4 on 
input /. 

3. Configuration generation rules. We add a rule that creates two successor 
configuration identifiers for each configuration identifier. Moreover, we 
add rules stating that these new configuration identifiers indeed identify 
configurations: 

config(X) -> 3X U X 2 succ(X,X 1 ,X 2 ), 
succ(X,Y,Z) — > config(Y), 
succ{X,Y 1 Z) — > config(Z). 

4. Transition rules. We show by an example how for each transition in the 
finite control, a set of transition rules are generated. Assume, for instance, 
that the transition table contains a specific transition of the form: (s, 0) — > 
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( (si, 1, r) , (s2, 0, £) ); then, we assert the following rules: 

state a (X), succ(X, X\, X2) — > state Sl (Xi) 
state S (X), succ(X, X\, X%) — > state S2 {X.2)- 

Moreover, for each 1 < i ^ n we have the two rules 

headi(X), zeroi(X), state s {X), succ(X, Xi,Xz) — > onei(X{) 
headi(X), zeroi(X), state s (X), succ(X, Xi, X2) — > headi+i{Xi), 

and for each 1 ^ i < n the two rules 

headi(X), zeroi(X), state s (X), succ{X, X\, X2) — > zeroi{X2) 
headi(X), zeroi(X), state s (X), succ(X, Xi, X2) headi-\(X2) 

We leave the other types of transition rules as an exercise for the reader. 
Note that the total number of rules added is 6n times the (constant) 
number of transition rules, hence is linearly bounded by the size n if the 
input string / to M. 

5. Intertia rules. These rules state that tape cells in positions not under the 
head keep their values. Thus, for each 1 ^ i $J n and 1 ^ j ^ n such that 
i j£ j, wc add the rules: 

headi(X), zeroj(X), succ(X, Xi, X2) — > zeroj{X\) 
headi(X), onej(X), succ(X, Xi, X2) —> onej(Xi) 
headi(X), blank j(X), succ{X, X\, X2) — > blankj(Xi), 

6. Configuration-type rules. These rules express that the immediate succes- 
sor configurations of an existential configuration arc universal, and vice 
versa: 

existential (X), succ(X, X\, X2) — > universal{X\) 
existential (X), succ(X, X\, X2) — > universal{X2) 
universal(X), succ(X, X\, X2) — > existential (Xi) 
universal(X), succ(X, Xi, X2) — > existential (-X^)- 

7. Acceptance rules. These recursive rules simply define when a configuration 
is accepting: 

state a (X) — > accept(X) 

existential(X), succ(X, Xi, X2), accept(Xi) —> accept(X) 

existential(X), succ{X, X\, X2), accept (X2) —> accept(X) 

universal(X), succ{X, X±, X2),accept(Xi), accept (X2) — > accept(X) 

init(X), accept (X) — > accept. 

This completes the description of the program £(.M, I). Note that this program 
is guarded, has predicate arity 3, can be obtained in logarithmic space from I and 
the constant machine description of M. , and faithfully simulates the behaviour 
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of the alternating linear space machine M in input /. It follows that S(A / I, /) |= 
accept iff A4 accepts input /. Let thus Do denote the empty database, and let 
Qo be the ground-atom query accept. We then have that £(.M, I)U Dq \= Q 
iff M accepts input /. This shows that answering ground atom queries on fixed 
databases constrained by bounded arity GTGDs is EXPTiME-hard. 

Let us now illustrate how we obtain the 2exptime hardness result for 
guarded TGDs when arities are unbounded, but when the number \TZ\ of predi- 
cate symbols of the signature 1Z is bounded by a constant. Let us denote alter- 
nating expspace as usual by aexpspace. Given that aexpspace=2exptime, 
our aim is now to simulate an aexpspace Turing machine. 

The problem is that to this aim we can can no longer establish, as before, a 
polynomial number of rules that explicitly address each worktape cell i, or each 
pair of cells i, j, given that there is now an exponential number of worktape cells. 
The idea is thus to encode tape cell indexes as vectors of symbols (vi,..., Vk) 
where the value of each Vi ranges over {0, 1}. We can then define, with a 
polynomial number of rules, a successor relation succ that stores pairs of indexes 
as succ(vi, . . . , Vk, Wi, . . . , Wk). However, there is a further difficulty: differently 
from the previous proof, we now have two different types of variables: the Vi, Wj 
variables representing the bits Vi, Wi in the above-described bit vectors, and the 
variables for denoting configurations (which will be X, Y, Z). A major difficulty 
is now that given that our rules are all guarded, we must take care that these 
two types of variables, whenever they occur in a rule body, must jointly occur in 
some guard. To this aim, we will use a constant database D\ that contains the 
single fact zeroone(0, 1), and we will construct a "guard" relation g such that 
for each vector v of k bits and its binary successor w, and for each configuration 
x, and its two successor configurations y and z, the relation g contains a tuple 
g(v, w, x, y, z). We will make use of several auxiliary relations on our way to 
construct g. A particular feature of these auxiliary relations is that each of them 
will have, in addition to its other arguments, two arguments, for which we will 
use the variables So and Si, whose values sq, s\ will be forced to take the values 
and 1, respectively. We do this in order to have the constants and 1 always 
at hand in rules where such predicates appear. 

Given that our database is now non-empty, we do not need to create the 
initial configuration identifier via an existential rule as before. We can simply 
take as the identifier of this initial configuration. 

zeroone(So, Si) —> k(Sq, Sq, Si). 

We also add: 

init(S , So, Si) config(S , S Q , Si). 
We now state the new configuration generation rules. 

config(X, So, Si) — > 3Y,3Z succ(X, Y, Z, So, Si), 
succ(X,Y,Z,S ,Si) config(Y,Sa,Si), 
succ(X,Y,Z) — s> config(Z, Sq, Si). 

Next, we use further rules to create a relation b such that each atom 
6(v, x, y, z, so, si) contains a tuple for each vector v of n bits, and for each 
configuration x. For better intelligibility, we will use superscripts for indicating 
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the arity of vector variables (or constants), for instance V^™) and v^ n \ More- 
over, for i = or i = 1, and ^ j ' ^ k, fip'^ denotes the vector of j identical 
Si-components (remember that sq and si correspond to the values and 1, re- 
spectively). In case the superscript is (0), the list is obviously the empty list. 
We start with the rule 

succ{X,Y,Z,S ,S{) -> 6(S , ^" ) ,X,r,Z,5 ,S'i), 

which creates an atom b(&Q^ , x, y, z, sq, s\) whose arguments contain n 0s, fol- 
lowed by the list x, y, z, 0, 1, for each configuration x and its successor configura- 
tions y and z. The following rules now will create for each triple a;, y, z an expo- 
nential number of new atoms, where each of the leading 0s in b{$^ , x, y, z, sq, si ) 
will be successively replaced by a 1 by swapping 0s to Is in any possible way. 
Eventually, the chase will generate all possible prefixes of n bits. We add for 
each 1 ^ i ^ n — 1 the following rule: 

b(u u . . . , [/,•_!, So, u i+1 , . . . , u n , x, y, z, So, Si) ->• 

b(Ui, . . . , C/j-i, Si, f/j+i, ■ ■ • , i/ n , X, Y, Z, So, Si). 

We are now ready to define our "guard" relation g through a further group of 
guarded rules. For each 1 ^ r ^ n — 1, we add: 

b(V r , S* , X, Y, Z, So, Si) — » ,g(U r , 5i, SS^ r_1) , U r , Si, X, Y, Z). 

Note that the above n rules define an exponential number of successor pairs and 
couple them with each triple x, y, z of state identifiers such that y and z are the 
two successors of x. In particular, the relation g contains precisely all tuples 
g(v, w, cc, y, z), such that v is an n-ary bit vector, w is its binary successor, x 
is a configuration identifier, y its first successor, and z its second successor. 

We are now ready to simulate an AEXPSPACE Turing machine M! over an 
input string / by a set of guarded TGDs S(jVC,J). Since this simulation is in 
essence very similar to the one presented earlier in this section, we just sketch 
it and point out the main differences. 

For the simulation we use (in addition to the above auxiliary predicates) 
similar predicates as for our above simulation of the exptime Turing machine 
M. However, we only use a constant number of predicates. So, rather than 
using, atoms headi(x), zerOi{x) and so on, we use head(v,x), zero(v,x) and so 
on, where v is a vector of length n that takes the role of an exponential index. 
Thus, for example, the equivalent of the former rule 

headi(X), zeroi(X), state s (X), succ(X, Xi, X2) — > onei{X\) 
is rewritten as: 

g(V,W,X,X u X 2 ),head(V,X),zero{V,X),state 3 (X) me(V,Xi). 
The former rule 

headi{X), zero i{X), state 8 {X) 1 succ{X, X\, X2) — > headi-\(X.2) 
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becomes 



g(V, W, X, X 1 ,X 2 ), head(W, X), zero(W, X), state s (X) -> head(V, X 2 ). 

It should now be straightforward to see how the initialization rules can be writ- 
ten. Informally, for copying the input I to the worktape, we place the n bits 
of / on the tape by writing a rule for each such bit. We then write a recursive 
program that fills all positions from n + 1 to 2™ with blanks. The details are 
omitted. 

The only issue that remains non immediately understood is the handling of 
the inertia rules. These rules deal with pairs i,j of different, and not necessary 
adjacent, tape cell positions in our former simulation. Here we have only adja- 
cent cell positions available so far. The problem can be solved in two different 
ways, which we briefly illustrate. 

First solution. We may simply modify the definition of the b predicate by 
adding a second vector of n bits to the b atoms so that 6-atoms actually have the 
form 6(v, u, x, y, z, so, si), where v and u range over all possible distinct pairs 
of bit vectors of length n. This u vector is then carried over to the g atoms. 
We can thus assume that the g atoms now have the form g(y, w, u, x, y, z). The 
former inertia rule 

headi(X), zeroj(X), succ(X, X±, X 2 ) —> zeroj(Xi) 

would then be rewritten as 

g(V, W, U, X, X lt X 2 ), head(W, X), zero(U, X) ->• zero(XJ, X{). 

Second solution. A second viable way of realizing the inertia rules is to 
define two new predicates head~ (v, x) and head + (v, x), and add recursive rules 
that, starting at the neighbors of the actual head position, assert head~{v,x) 
for each position to the left of the head position of each configuration x, and 
head + (v, x) for each position to the right of the head position of configuration 
of each configuration x. We leave this as a simple TGD-programming exercise 
to the reader. The inertia rules then look as follows; 

g(V,W,X,X 1 ,X 2 ),head-(V,X),zero(V,X) -> zero(V,Xi) 
g(V, W,X,X 1 ,X 2 ),head + (W,X),zero(W,X) -> zero(W,Xi), 

and so on. 

What remains to be defined are the configuration and the acceptance rules. 
The configuration type rules are very similar to the ones of the previous reduc- 
tion, hence we leave them to the reader as exercise. The acceptance rules are 
as follows: 

state a (X) — > accept(X) 
existential (X), g(V, W, X, X±, X 2 ), accept (X±) — > accept(X) 
exist ential(X), g(V , W , X, X±, X 2 ), accept(X 2 ) — > accept(X) 
universal(X), g(V , W , X, X±, X 2 ), accept(Xi), accept(X 2 ) — > accept(X) 

zeroone(So, Si), accept(So) — > accept. 

This completes the description of the program Note that this 

program is guarded and has a constant number of predicates. It can be obtained 
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in LOGSPACE from I and the constant machine description of Ai, and faithfully 
simulates the behaviour of the alternating exponential space machine M! in 
input I. It follows that Y,(M.',I) \= accept iff M' accepts input /. Let thus 
Di denote the database containing the unique tuple zeroone{Q, 1), and let Qo 
be the ground-atom query accept, that is, Qo = {accept}. We then have that 
Y>(A4', I) U D\ \= Qo iff M! accepts input /. This shows that answering ground 
atom queries on fixed databases under guarded TGDs with a fixed number of 
predicate symbols, but of unbounded arity, is 2EXPTiME-hard. □ 

7 Polynomial Clouds Criterion 

In the previous section we have seen that, in case of bounded arity, query an- 
swering under weakly guarded sets of TGDs is EXPTlME-complete, while query 
answering under GTGDs is NP-complete. Notice that, for unrestricted queries 
and databases, NP-completeness is the best we can obtain, given that it is well- 
known that, even in the absence of constraints, the problem D \= Q is already 
NP-complete [2"4] . 

In this section, we establish a criterion that can be used as a tool for recog- 
nizing relevant cases, where query answering is in NP even for weakly guarded 
sets of TGDs that are not fully guarded. Note that we consider both the set- 
ting where the weakly guarded set £ of TGDs is fixed, and the setting where 
classes of TGD sets are considered. For the latter classes, we require uniform 
polynomial bounds. 

Definition 53 (Polynomial Clouds Criterion). A fixed weakly guarded set £ of 
TGDs of size n satisfies the Polynomial Clouds Criterion (PCC) if both of the 
following conditions are satisfied: 

1. There exists a polynomial tt such that for each instance D, 
clouds (£, D) I ' rJ\ ^ 7r(|D|); in other words, there are - up to isomorphism 
- only polynomially many clouds. 

2. There exists a polynomial tt'(-) such that for each instance D and for each 
atom a: 

• if a £ D, then cloud(D,T,,a) can be computed in time 7r'(|D| • n), 
and 

• ^ D, then cloud(D, S, a) can be computed in time tt'(\D\ -n) from 
D, a, and cloud{D 1 Y, 1 b), where b is the predecessor of a mgcf(-D,£). 

We also say that S satisfies teh PCC with respect to p and p' . Note that 
in the above, n is constant and can be omitted. However, the use of n is 
justified by the following. A class C of TGD sets satisfies the PCC if there 
are fixed polynomials it and tt' such that each TGD set in C satisfies the 
PCC uniformly with respect to ir and tt' 

Theorem 54. Let £ be a fixed weakly guarded set of TGDs over a schema 1Z, 
such that E enjoys the Polynomial Clouds Criterion. Then: 

• Deciding for an instance D and an atomic Boolean conjunctive query Q 
whether BUS \= Q, or, equivalently, whether chase(D,Yi) \= Q, is in 

PTIME. 
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• Deciding for an instance D and a general Boolean conjunctive query Q 
whether D U E |= Q, or, equivalently, whether chase{D 1 H) \= Q is in NP. 

Proof. A polynomial algorithm Acheck2 for atomic queries Q works as follows. 
We start to produce the chase forest gcf(D, E) using the standard chase and, in 
addition, compute for each node a, immediately after having generated a and 
its cloud cloud(D, E, a) in polynomial time, and store can a (a, cloud(D, E, a)) in 
a buffer that we call cloud-store. Whenever a branch arrives at a vertex b such 
that canf,(cloud(D,T,,b)) is already in the cloud-store, block branch b. Since 
there can be only a polynomial number of pairs can a {a, cloud(D,T,,a)), the 
algorithm stops after a polynomial number of chase steps, each one requiring 
only polynomial time. Now, by Corollary 021 the cloud-store already contains 
all possible atoms of chase{D, E) and their clouds, up to isomorphism. To check 
whether for an atomic query Q, chase(D,Y,) |= Q holds, it is thus sufficient to 
test whether every atom c occurring in the cloud-store matches Q. In summary, 
Acheck2 runs in PTIME. 

The algorithm Qcheck2 for conjunctive queries works just like Qcheck, ex- 
cept that it calls a new algorithm Tcheck2 as subroutine instead of Tcheck. 
Tcheck2 uses as input, in addition to D and Q, also the cloud-store com- 
puted by Acheck2. We further assume that this cloud-store identifies each entry 
e = can a (a, cloud(D, E,o)) by a unique integer e# using O(logn) bits only. 
Tcheck2 is an alternating algorithm that works essentially like Tcheck, except 
for the following main modifications: 

• Tcheck always guesses the full cloud S = cloud(D, E, a), instead of possi- 
bly guessing a subcloud. 

• Instead of guessing an explicit cloud cloud(D,T,,a), however, Tcheck2 
just guesses the entry number e# of the corresponding entry 
can a (a, cloud{D 1 E, a)) of the cloud-store. 

• The cloud guess is verified in ALOGSPACE to be correct, using the input 
instance D, and using a, e#, as well as 6, e^, where b is the main atom 
of the predecessor configuration and e' is the entry in the cloud-store fea- 
turing can,b(b, cloud (D,T,,b)). Note that such a verification is effectively 
possible due to condition (2) of Definition [5^1 

• Tcheck2 only needs to compute the main configuration tree (the one whose 
configurations contain ★) and does not compute the auxiliary branches. 
In fact, the auxiliary branches are no longer necessary, given that the 
correctness check S is already done in a different way. 

• The configurations of Tcheck2 do not need to guess nor memorize linear 
orders -< and the set S + . 

Given that Tcheck2 is an ALOGSPACE algorithm, Qcheck2 is an np alogspace p ro _ 
cedurc. Given that np alogspace — np p = np, emery answering is in np. □ 

Note that the Polynomial Clouds Criterion is not a syntactic one. However, 
it turns out to be a useful tool that may help a great deal in proving that 
query answering for specific weakly guarded sets E of TGDs is in np, or even in 
polynomial time for atomic queries. An application of this criterion is illustrated 
in Section [TU1 

The following is an easy consequence of Theorem [51] (1). 
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Theorem 55. (1) Every set E of GTGDs satisfies the PCC. (2) For each 
constant c, the class of all GTGD sets of arity bounded by c satisfies the PCC. 

The following result can be obtained by a minor adaptation of the proof of 
Theorem [Ml 

Theorem 56. Let E be a fixed weakly guarded set of TGDs which enjoys the 
Polynomial Clouds Criterion, and let k be a constant. Then: 

(1) Deciding for an instance D and for an atomic Boolean conjunctive 
query of treewidth k whether D U E |= Q or, equivalently, whether 
chase(D, E) |= Q is in PTIME. 

(2) The same tractability result holds for acyclic Boolean conjunctive 
queries. 

In analogy to the PCC, one may define various other criteria based on other 
time bounds. In particular, we may define the exponential clouds criterion 
(ECC) for classes of TGD sets, which we will use in the next section, as follows: 

Definition 57 (Exponential Clouds Criterion). Let C be a class of weakly 
guarded TGD sets. C satisfies the Exponential Clouds Criterion (ECC) if both 
of the following conditions are satisfied: 

1. There exists a polynomial tt such that for each instance D, and for each 
set of TGDs E in C of size n, \clouds(D,Y,)/~\ ^ 2<\ D \ +n \ 

2. There exists a polynomial n' such that for each instance D, for each set 
of TGDs E in C of size n, and for each atom a: 

• if Ql 6 D, then cloud(D, E, a) can be computed in time 2™ (\ D \ +n ) , and 

• if Ql D, then cloud(D, E, a) can be computed in time 2 7r d- D l+ n ) from 
D, a, and cloud (D,E,6), where b is the predecessor of a mgcf(-D,E). 

We have the following result on sets of TGDs enjoying the ECC: 

Theorem 58. For sets E of weakly guarded set of TGDs from a class C enjoying 
the Exponential Clouds Criterion, deciding for an instance D and a Boolean 
conjunctive query Q (atomic or non-atomic) whether DUT, \= Q is in exptime. 

Proof (sketch). The proof is very similar to the one of Theorem [Ml the 
main difference being that ptime and ALOGSPACE are replaced by exptime 
and APSPACE, respectively. We then get that query answering for atomic 
queries is in apspace = exptime, and that answering non-atomic queries is 
in NP APSPACE = np exptime _ EXPXIME Thus, in this case, there is no difference 
between atomic and non-atomic query answering. Both are in exptime. □ 

8 TGDs with multiple-atom heads 

As we mentioned in Section [2j all complexity results proved in this paper for 
TGDs with a single-atom head also carry over to the general case, where multiple 
atoms may appear in rule heads. We make this more formal here. 

Theorem 59. All complexity results derived in this paper for sets TGDs whose 
heads are single-atoms are equally valid for sets of multi-atom head TGDs. 
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Proof (sketch). It is clearly sufficient to show that the upper bounds carry over 
to the setting of TGDs with multiple-atom heads. We exhibit a transformation 
from an arbitrary set of TGDs E over a schema 7Z to a set of single-atom TGDs 
E' over a schema 1Z' that extends 1Z with some auxiliary predicate symbols. 

The TGD set E' is obtained from E by replacing each rule of the form 
r : body{~X.) — > head i(Y), head2(Y), ■ ■ ■ , headk(Y), where k > 1 and Y is the 
set of all the variables that appear in the head (that may include part of X), 
with the following set of rules: 

body(X) -> V(Y) 
V(Y) -> headi(Y) 
V(Y) -> head 2 (Y) 

V(Y) -> head k (Y), 

where V is a fresh predicate symbol, having the same arity as the number 
of variables in Y; notice also that in general not all the variables in Y also 
appear in X. It is straightforward to see that, except for the atoms of the form 
V(Y), chase(D,T,) and chase(T,' , D) coincide. The atoms of the form V(Y), 
being introduced only in the transformation above, do not match any predicate 
symbol in Q, hence, chase(D,Y,) \= Q iff chase (H' ', D) |= Q. 

Obviously, E' can be constructed in logspace from E. Moreover, it is immedi- 
ate that for each conjunctive query over the original schema TZ, chase(D, E) |= Q 
iff chase(T,' , D) \= Q. Therefore, the extension of our complexity results to the 
general case is immediate, except for the case of bounded arity. Notice that 
the arity of each auxiliary predicate in the above construction depends on the 
number of head- variables of the corresponding transformed TGD, which is in 
general not bounded. 

In case of bounded-arity WGTGDs, the exptime upper bound can still be 
derived by the above transformation by showing that the class of programs E' 
resulting by that transformation from arbitary programs E satisfies the Expo- 
nential Clouds Criterion defined in Section [7J To see that for each database D 
and each such E' there are only an exponential number of clouds, notice that 
every "large" atom V(Y) is derived by a rule with a "small" weak guard g in 
its body, i.e., a weak guard g of bounded arity. The cloud cloud (D, E' , g) of 
this weak guard g clearly determines everything below g in the chase forest, in 
particular, the cloud of ^(Y). Thus the set clouds (E' , D) of all clouds of all 
atoms is only determined by the clouds of atoms of bounded arity, of which 
- for immediately verifiable combinatorial reasons - there can only be singly 
exponentially many. This shows that \ clouds (£' , D)/ ~ | is singly exponentially 
bounded. This shows that the first condition of Definition [57] is satisfied. It is 
not too hard to verify the second condition of Definition [S7J too. Thus, query- 
answering based on bounded-arity WGTGDs is in exptime. Given that GTGDs 
are a subclass of WGTGDs, the same exptime bound holds for bounded-arity 
GTGDs, too. □ 
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9 EGDs 



In this section we deal with equality generating dependencies (EGDs), a gener- 
alization of functional dependencies, which are in turn a generalization of key 
dependencies [T]. 

Definition 60. Given a relational schema 1Z, an EGD is a first-order formula 
of the form VX<f>(X) — > Xi = X&, where $(X) is a conjunction of atoms over 1Z, 
and Xh,Xk £ X. Such a dependency is satisfied in an instance D if, whenever 
there is a homomorphism h that maps the atoms of $(X) to atoms of D, we 
have h{X e ) = h(X k ). 

It is possible to "repair" an instance according also to EGDs, besides TGDs. 
We start by defining the EGD chase rule. 

EGD Chase Rule. Consider an instance D, and an EGD 77 of the form 
$(X) Xi = X k , where X e ,X k G X. The EGD r\ is applicable to D if 
there is a homomorphism h that maps the atoms of $(X,Y) to atoms of D 
and h(Xi) ^ h{Xk). If rj is applicable and Xi,Xk are two distinct elements 
of dom(D), then the application of the EGD yelds a hard constraint violation, 
which in turn causes the failure of the chase, and the halting of its computation. 
In such a case, the result of the chase is an inconsistent theory. If rj is applicable 
and its application does not make the chase fail, the result of its application 
is the replacement of all occurrences of h(Xe) in all D with h(Xk), if h(Xk) 
precedes h{Xg) in the lexicographical order. If h(Xe) precedes h(Xk), we replace 
all occurrences of h(Xk) with h(X(). m 

Notice that, in the above application of an EGD, h is a homomorphism but 
not an endomorphism; in fact, in general, h(D) is not a subset of D. 

Definition 61. Given a database D for a schema 1Z and two sets £t and Y,e 
of TGDs and EGDs, respectively, the chase of D in the presence ofT*T and T*e, 
denoted chase(D, T*t US^), is computed by iteratively applying: 

1. a single TGD once, according to the standard order, and 

2. the EGDs, as long as they are applicable, that is, until a fixpoint is reached. 

It is well-known [55] that EGDs are troublesome when combined with TGDs, 
because even for very simple types of EGDs, such as plain key constraints, 
the implication problem for EGDs plus TGDs, as well as the query answering 
problem, are undecidable. This remains unfortunately true, even for EGDs plus 
GTGDs. In fact, even though inclusion dependencies are fully guarded TGDs, 
the implication problem, as well as query answering and query containment, are 
undecidable for key plus inclusion dependencies [2H 133 HH] ■ 

Moreover, while the result of an infinite chase using TGDs is well-defined as 
the limit of an infinite, monotonically increasing sequence (or, equivalcntly, as 
the least fixed-point of a monotonic operator) , the sequence of sets obtained in 
the infinite chase of a database instance under TGDs and EGDs is, in general, 
neither monotonic nor convergent. Thus, even though we can define the chase 
procedure for TGDs plus EGDs, it is not clear, how the result of an infinite chase 
involving both TGDs and EGDs should be defined. However, if the infinite chase 
converges, then the result is a universal solution, as shown in |51| . 

For the above reasons, we cannot hope to extend the positive results for 
weakly guarded sets of TGDs, or GTGDs, from the previous sections to include 
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arbitrary EGDs. Therefore, we were looking for suitable restrictions on EGDs, 
which would allow us to: (i) use the (possibly infinite) chase procedure to obtain 
a query- answering algorithm, and (ii) transfer the decidability results and upper 
complexity bounds derived in the previous sections to the extended formalism. 

A class that fulfills both desiderata is a subclass of EGDs, which we call 
innocuous (relative to a set of TGDs). They enjoy the property that query 
answering is basically insensitive to them, provided that the chase does not fail. 
In other terms, when E = Et U E^, where Et is a set of TGDs, Eg a set 
of EGDs, and Eg is innocuous for Ey, we can simply ignore these EGDs in a 
non-failing chase, because for relational instances such that chase(D,T,) does 
not fail, we have that chase(D,Y,) satisfies exactly the same set of conjunctive 
queries as chase(D, Ep). Recall that according to our definition of the chase in 
the presence of TGDs and EGDs given in Section [3J after each application of a 
TGD, all EGDs are applied exhaustively. 

Definition 62 (Innocuous EGD application). Consider a (possibly infi- 
nite) non-failing chase sequence chase°(D,T,), chase 1 (D , E) , chase 2 (D, E), 
chase 3 (D, E), . . . , where D is an instance and E a set of TGDs and EGDs. Sup- 
pose that for a particular value i, chase l+1 (D, E) is obtained from chase 1 (D, E) 
via an application of an EGD. We say that this EGD application is innocuous 
if chase l+1 {D,Y>) C chase 1 (D, E). 

It is desirable to have innocuous EGD applications because such applications 
cannot trigger new TGD applications, i.e., TGD applications that were not 
possible before the EGD was applied. Thus, such EGDs cannot be responsible 
for perpetuating an infinite chase process. 

Definition 63. Let E = Et UEb, where Ey is a set of TGDs and Eg a set of 
EGDs. Eg is innocuous for Ey if for every instance D where chase(D, E) does 
not fail, each application of an EGD in the chase of E on D is innocuous. 

Theorem 64. Let E = Ey U Eg, where Ex is a set of TGDs and Eg a set of 
EGDs that is innocuous for St- Let D be an instance such that chase(D,Y,) 
does not fail. Then E U D \= Q iff chase(D, Et) |= Q- 

Proof. Consider the chase of D in the presence of E, which leads to a possi- 
bly infinite sequence of dependency applications (<ti, h\), (02, /J2), (03, ^3), • ■ 
where each cr, is a dependency in E and hi is the homomorphism used at step 
i to map the body atoms of o~i to some atoms of chase 1-1 [D , E). Let us define 
a modified chase procedure which we call the blocking chase, and whose result 
we denote as blockchase{D ,Yj) . The blocking chase uses two sets: a set B of 
blocked atoms and a set of (unblocked) atoms A. When started on a database 
D such that D \= Eg (the case D ^= Eg is not possible as this would mean 
immediate chase failure), B is initialized to the empty set (B = 0) and A is 
initialized to be equal to D. After the initialization, the blocking chase attempts 
to apply the dependencies in Et U Eg exactly in the same order as the standard 
chase, by performing the following action at the i-th step, while trying (a, hi): 

• If o~i is a TGD, and if hi{body{o~i)) n B = 0, then apply (<7j, hi) and add 
the new atom generated by this application to A. 

• If <7j is a TGD and if h% ( body (cr,-)) PI B ^ , then the application of (cr* , hi ) 
is blocked, and nothing is done. 
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• If <7j is an EGD, then the application of (<Tj, foj) proceeds as follows. Add to 
B all the facts that in the standard chase disappear in that step, i.e., add 
to B the set chase l (D, E) — chase 1 ^ 1 (D , E). Thus, instead of eliminating 
the tuple from A, the blocking chase simply bans it from being used by 
putting it into B. 

Note that, by the construction of blockchase(D,Y,), whenever the block chase 
encounters an EGD (cr,,/^) is actually applicable, so blockchase(D,Ti) is 
well-defined. Let us use Bi and Ai to denote the values of B and A after step 
i, respectively. Initially, Bo = and Aq = D as explained before. Observe 
that = B C £?! C 5 2 C • • • and £) = ^4 C A\ C A 2 C ■ • • are mono- 
tonically increasing sequences that have least upper bounds B* = Ui>o an< ^ 
A* = Ui>o^»> respectively. Clearly, (B*,A*) is the least fixpoint of the trans- 
formation performed by blockchase(D ',£) (with respect to component- wise set 
inclusion). 

Let S = A* - B* . By the definition of S, we have: S \= E. Moreover, there 
is a homomorphism h that maps chase{D, Ey) onto S 1 . 

Note that ft, is the limit homomorphism of the sequence hi, hi, h^, . . ., and 
can be defined as the set of all pairs (x, y) such that there exists an i ^ such 
that hi(hi-\(- ■ ■ hx(x))) — y and y is not altered by any homomorphism hj for 
j > i. Note that labeled nulls in D arc interpreted as cxistcntially quantified. 
Therefore, for any set 5" containing a homomorphic image of D, S' (= D holds. 
This is, in particular, true for S. Therefore, S \= DUE. Also, it is well- 
known [ST] that for each set of atoms M, such that M \= S U Et, there exists a 
homomorphism Hm, such that hM{chase(D, E-r)) C M. Now assume D U E |= 
Q. Then 5 |= Q and, because 5 C chase(D , E^) , also chase(D,Y,T) \= Q- 
Conversely, if chase{D, Ej>) |= Q, then there is a homomorphism g, such that 
g(Q) Q chase(D, Ey). Therefore, for each set of atoms M, such that M |= DUE, 
since h M (chase(D, E T )) C M, h M (g(Q)) C M, it follows that M (= Q. □ 

We now come to the problem of checking, given a set of dependencies E = 
Ey U Eg, where Et are a weakly guarded set of TGDs and E^ are EGDs 
innocuous for Ex, and an instance D, whether chase(D,T,) fails. First of all, 
we introduce some notation. Consider an application of the EGD chase rule 
for a certain EGD 77, on a chase constructed up to step i. Let 77 be of the 
form <&(X) — > Xi = Xk, and h be the homomorphism that maps $(X) to 
chase l (D, E) in the application. When the application causes the failure of the 
chase, we have that h(X{) and h(Xk) are distinct values in dom(D); in this case 
we write chase l (D,Y.) ^ 77. 

Lemma 65. Consider a set of dependencies E = Ey U Tie, where E^ is a 
weakly guarded set of TGDs and Eg are EGDs that are innocuous for Ey, and 
an instance D. We have that chase(D,Yi) fails if and only if there exists a 
non-negative integer k such that chase k (D , Ex) ^ n, where n £ E^. 

Proof. 

(Only if). If there is a failure, it is finite and it happens at some step £, at 
which an EGD 77 is violated. This means that chase e (D, E) ^ 77. A fortiori, 
we have, for some k, chase k (D ,T,t) ^ 77, where k is the step at which the 
chase constructed w.r.t. Et fails. This holds because applications of EGDs that 
are innocuous for Ey can only remove tuples from the chase; therefore, if 77 
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is applicable to chase e (D,Y>), then it is also applicable to chase k (D , Et) , still 
causing a failure of the chase. 

(If). Assume chase' (D ', £r) ^ V for some positive integer k. With the 
same argument as in the "Only if" direction of the proof, we know that 
chase 3 (D, E) ^ 77, for some j ^ k. □ 

Theorem 66. Consider a set of dependencies E = E^ U Ee, where St c^e 
GTGDs (resp., a weakly guarded set of TGDs) and Tie are EGDs that are 
innocuous for Ey, and an instance D. Checking whether chase(D,Yi) fails is 
decidable, and has the same complexity as query answering for GTGDs (resp., 
weakly guarded sets of TGDs) alone. 

Proof. The prove the theorem by exhibiting an algorithm that checks a chase 
for failure, and has the required complexity. We first introduce a new predicate 
neq of arity 2, that will serve as inequality predicate. More formally, we define 
the extension of neq as dom(D) x dom(D) — {(d, d) \ d G dom(D)}; such an 
extension can be constructed in time quadratic in \dom(D)\. Now, for every 
EGD 77 of the form $(X) -> X\ = X 2 , with Xi,X 2 G X, we introduce the 
Boolean conjunctive query 

Q n = $(X),neq(X 1 ,X 2 ) 

which we denote with Q v . Since no new facts of the form neq(o~i, o~ 2 ) are intro- 
duced in the chase, it is immediate to see that Q v has positive answer if and 
only if there exists a non-negative integer k such that chase k (D , E^) ^ r\. By 
Lemma 1651 we get a technique for checking whether chase(D,Y.) fails. We de- 
note by Q m , . . . , Q, ln the queries constructed according to the EGDs rji , . . . r\ n 
respectively, with {771, . . .n n } = E^. We evaluate Q Vl , ■ ■ ■ ,Qrj n '- if one of such 
queries has positive answer, then chase(D,T) fails, otherwise it does not. □ 

Let E = Ey U T,e be as in the statement of the above theorem, let D be an 
arbitrary instance, and Q be a query. Due to the above theorem, we can check 
whether E U D \= Q with the help of the following query-answering algorithm. 

1. check whether chase(D,Y,) fails with the algorithm described in Theo- 
rem l66l 

2. if this chase fails, then ouput "fail" and halt; 

3. if chase{D,Tix) \= Q then output "true"; otherwise output "false". 

This gives us the following corollary to Theorem IM1 

Corollary 67. Answering general conjunctive queries under weakly guarded 
sets of TGDs and innocuous EGDs is ptime reducible to answering queries of 
the same class under (weakly guarded sets of) TGDs alone, and thus has the 
same complexity. 

10 Applications 

In section we show the application of our results on weakly guarded sets of 
TGDs to Description Logic languages (in particular those in the DL-lite family 
of ontology languages), and to a formalism called F- logic Lite; we also show, 
as a special case of our results, that query answering and containment under 
F-logic Lite rules are NP-complete. 
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10.1 DL-lite 



DL-lite [531 2] is a prominent family of ontology languages that enjoy tractability 
of query answering. Interestingly, a restriction of GTGDs called linear TGDs 
(which have exactly one body-atom and one head-atom) is able to properly 
extend most DL-lite languages, as shown in |12) . The complexity of query 
answering under linear TGDs is lower than that of GTGDs, and we refer the 
reader to [T2] for more details. 

In [T2] it is also shown that the language of GTGDs properly extends the 
description logic £L (as well as its extension £L£f , which also includes inverse 
and functional roles). 

The fact that Datalog ± languages capture important DL-based ontology 
languages confirms that TGDs (and Datalog*) are a useful tool for ontology 
modeling and querying. 

10.2 F-logic Lite 

F-logic Lite is a smaller but expressive version of F-logic, a well-known formalism 
introduced for object-oriented deductive databases. We refer the reader to refer 
the reader to [17] for details about F-logic Lite. Roughly, with respect to F- 
Logic, F-logic Lite excludes negation and default inheritance, and allows only a 
limited form of cardinality constraints. We now encode F-logic Lite using a set 
TGDs and EGDs, that we denote with Hfll, with Sfll = {Pi}i^i^i2- 

(1) type{0,A,T),data(0,A,V) -> member(V, T). 

(2) sub(Ci,C 3 ),sub(C 3 ,C 2 ) ->sub(Ci,C 2 ). 

(3) member(0,C*),sub(C,C*i) member(0, d). 

(4) data(0, A, V), data(0, A, W), funct(A O) -> V = W. 
Note that this is the only EGD in this axiomatization. 

(5) mandatory^, O) 3V data(0, A, V). 

Note that this TGD has an existcntially quantified variable in the head. 

(6) member(0,C),type(C,A,T) -> type(0, A, T). 

(7) sub(C,Ci),type(Ci, A,T) -t type(C,A,T). 

(8) type(C ) A,ri),sub(T 1 ,T)^type(C,AT). 

(9) sub(C,Ci) , mandatory(j4, Ci) — y mandatory(A, C). 

(10) member(0, C), mandatory(yl, C) — > mandatory(A, O). 

(11) sub(C,C*i),funct(A,Ci) -> funct(A,C). 

(12) member(0,C*),funct(A,C*) ->• funct(A, O). 

Notice that the results of our paper apply to the above set of constraints, since 
the TGDs in the above set are a weakly guarded set, and the single EGD is 
innocuous, as easily verified. 

We now prove our complexity results. 

Theorem 68. Conjunctive query answering under F-logic Lite rules is NP-hard. 

Proof (sketch). The proof is by reduction from the 3-COLORABILITY problem. 
Encode a graph G = (V, E) as a conjunctive query Q which, for each edge 
(vi,Vj) in E, has two atoms data(X, Vi, Vj) and data(X, Vj, Vj), where X is a 
unique, fixed variable. Let D be the instance D = {data(o, r, g), data(o, <?, r), 
data(o, r, b), data(o, b, r), data(o, g, b), data(o, b, g)}. Then, G is three-colorable 
iff D |= Q, which is the case iff D U Xfll \= Q- The transformation from G to 
(Q, D) is obviously polynomial. This proves the claim. □ 
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Theorem 69. Conjunctive query answering under F-logic Lite rules is in np. 

Proof (sketch). As mentioned before, we can ignore the only EGD in Sfll 
since, being innocuous, it does not interfere with query answering. Let us denote 
with T>' FLL the set of TGDs resulting from T,fll by eliminating rule p^, i.e., let 
= S fll — {Pi}- To establish membership in np, it is sufficient to show 

that: 

(1) S' FLL is weakly guarded. 

(2) "E'fll is such that, for every instance D, there are, up to D-isomorphisms, 
polynomially many clouds; more precisely, for every instance D there ex- 
ists a polynomial 7r such that \ clouds (Y, , D) / ~\ 7r(|£>|). 

(3) There exists a polynomial it' such that for each instance D and for each 
atom o: 

• if a £ D, then cloud(Y>, D,a) can be computed in time 7r'(|D|), and 

• if a ^ D, then cloud(T,, D,a) can be computed in time 7r'(|D|) from 
D, a, and cloud(Y,, D,b), where b is the predecessor of a in gcf(E, D). 

Under the above condition, the membership in NP can be proved by exhibit- 
ing the following, (i) An algorithm, analogous to Acheck, that constructs all 
"canonical" versions of the atoms of the chase and their clouds (the latter are 
stored in a "cloud store"), in polynomial time; then, checks whether an atomic 
(Boolean) query is satisfied by some atom in the cloud store, (ii) An algo- 
rithm, analogous to Qcheck, that guesses (by calling an analogous version of 
Tcheck) entire clouds by guessing their index (a unique integer) in the cloud 
store, and checks in alternating logarithmic space (alogspace) the correctness 
of the cloud guess, by using in addition only the cloud of the main atom of the 
predecessor configuration. The complexity of running this algorithm is shown 
to be N p ALOGSPACE = np. 

(1) is readily seccn: the affected positions arc the following: data [3], 
member[l], type[l], mandatory [2], funct[2], data[l]. It is easy to see that every 
rule of Y' FLL is weakly guarded, and thus Y*fll is weakly guarded. 

Now let us sketch (2). Let £^" L = Y' FLL - {,05}, i.e., the set of all TGDs of 
S' FLL but ^5. These are all full TGDs and their application does not alter the 
domain. We have chase(D,'S' FLL ) = chase(chase(D,T, F ^ L ),'E' FLL ). Let us now 
have a closer look at D + = chase(D, 

^fll)- Clearly, dom{D+) = dom(D). For 
each predicate symbol P, let Rel(p) denote the relation consisting of all p-atoms 
in D + . Let f2 be the family of all relations that can be obtained from any of the 
relations Rel(p) by performing an arbitrary selection followed by some projection 
(we forbid disjunctions in the selection predicate). For example, assume c, d G 
dom(D); then, Rel(data) will give rise to relations 7ri j 2(c{l = c}i?e/(data)), and 
to 7T2(cr{l = d A 3 = c}i?eZ(data)), and so on, where the numbers are attribute 
identifiers (the notation here should be self-explanatory). Given that D + is 
of size polynomial in D and that the maximum arity of any relation Rel{p) 
is 3, the set Q, is of size polynomial in D + and thus polynomial in D. It can 
now be shown that f2 is preserved in a precise sense, when going to the final 
result chase(D + ,E' FLL ). In particular, for each relation Rel (p) corresponding 
to predicate p in the final chase result, when performing a selection on Rel' (p) 
that assigns fixed values not in dom(D) to one or more attributes, and projecting 
on the other columns, the set of all tuples of <iom(Z?)-elements in the result is 
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a relation in il. For example, assume that v$ is a specific labeled null, then 
the set of all T £ dom(D) such that member^, T) is an element of the final 
result is a set in f2; similarily, if vj and v% are new values, the set of all values 
A such that data(i>7, A, vg) is a relation in fi. It is easy to see that from this 
it follows that E' FLL satisfies (2). In fact, all possible clouds are determined by 
the polynomially many ways of choosing at most three elements of f2 for each 
predicate. The proof of the preservation property can be done by induction on 
the i-th new labeled null added. Roughly, for each such labeled null, created by 
rule P5, we just analyze which sets of values (or tuples) are attached to it via 
rules p4, then p^, p-?, p%, pxo, and so on, and conclude that these sets were all 
already present at the next lower level, and thus, by induction hypothesis, are 
in O. 

Condition (3) can straightforwardly proved by similar arguments. □ 
From Theorems and EH we immediately get: 

Corollary 70. Conjunctive query answering under F-logic Lite rules is np- 
complete. 

11 Conclusions and related work 

In this paper we identified a large and non-trivial class of tuple- generating and 
equality-generating dependencies for which the problems of containment and 
answering for conjunctive queries are dccidablc, and provided the relevant com- 
plexity results. Applications of this result include databases and knowledge 
representation. In particular, we have shown that this class of constraints sub- 
sumes the classical work of Johnson and Klug [55] as well as (with some extension 
not detailed in this paper) the more recent results from [T7]. Moreover, we are 
able to capture relevant ontology formalisms in the Description Logics family, 
in particular DL-lite and £ L. 

Related work. The problem of query containment in the case of non- 
terminating chase was addressed in the database context by Johnson and 
Klug |38| . where the ontological theory contains inclusion dependencies and 
key dependencies of a particular form. A thorough analysis of The introduction 
of the DL-Lite family of DLs by Calvanese et al. [23] 0] represented a signifi- 
cant leap forward in ontological query answering, due to the expressiveness of 
DL-lite languages and to their tractable data complexity (i.e., complexity where 
the query and the ontology are fixed). Conjunctive query answering in DL-Lite 
has the advantage of being first-order rewritable, i.e., a pair (Q, E), where Q is 
a CQ and E is a DL-Lite TBox, can be rewritten as a first-order query such 
that, for every instance (ABox) D, the answer to Q against the logical theory 
DUS coincides with the answer to Qs against D. Since each first-order query 
can be written in SQL, in practical terms this means that a pair (q, E) can be 
rewritten as an SQL query over the original instance D. 

Rcwritability is widely adopted in ontology querying. The works |19l 110] 
present query rewriting techniques that deal with Entity-Relationship schemata 
and inclusion dependencies, respectively. The work in |53] presents a Datalog 
rewriting algorithm for the expressive DL £\j'HX0~' , which comprises a limited 
form of concept and role negation, role inclusion, inverse roles, and nominals, 
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i.e., concept that are interpreted as singletons; conjunctive query answering in 
£L"HI0^ is PTiME-complete in data complexity, and the proposed algorithm is 
also optimal for other ontology languages such as DL-Lite. 

Other rewriting techniques for PTiME-complete languages (in data complex- 
ity) has been proposed for the description logic £L [56] [45J 02] . Another ap- 
proach worth mentioning is a combination of rewriting according to the ontology 
and of expansion (chase) according to the data [5T]; this technique was intro- 
duced for DL-lite in order to tackle the performance problems that arise when 
the rewriting according to the ontology is too large. 

Recent works concentrate on semantic characterization of sets of TGDs. [6] 
The notion of first-order rewritability is tightly connected to that of finite 
UNIFICATION set (fus). A FUS is semantically characterized as a set of TGDs 
that enjoy the following property: for every conjunctive query Q, the rewriting 

of Q obtained by backward-chaining through unification, according to the 
rules in E, terminates. Another semantic notion to characterize sets of TGDs 
is that of BOUNDED treewidth set (bts), i.e., a set of TGDs such that the 
chase under such TGDs has bounded treewidth. As seen in Section [3J every 
weakly-guarded set of TGDs is a bts. 

The Datalog* family Q3] has been proposed, with the purpose of providing 
tractable query answering algorithms for more general ontology languages. In 
Datalog*, the fundamental constraints are TGDs and EGDs. Clearly, TGDs 
are an extension of Datalog rules; the absence of value invention (existential 
quantification in the head), thoroughly discussed by Patel- Schneider and Hor- 
rocks [55], is the main shortcoming of plain Datalog in modeling ontological 
reasoning, and even conceptual data formalisms such as the Entity-Relationship 
model [27] . Sets of GTGDs or weakly guarded sets of TGDs are Datalog ± on- 
tologies. Datalog* languages easily extend the most common tractable ontology 
languages; in particular, the main DL-Lite languages (see [T5]). The fundamen- 
tal decidability paradigms in the Datalog* family are the following. 

• Chase termination. When the chase terminates a finite instance is pro- 
duced; obviously, by Theorem [5] query answering in such a case is de- 
cidable. The most notable syntactic restriction guaranteeing chase ter- 
mination is weak acyclicity of TGDs, for which we refer the reader to 
the milestone paper [35]; more general syntactic restrictions were studied 
in [30] [48]. 

• Guardcdncss. This is the paradigm we propose in this paper. A thorough 
study of the data complexity of query answering under GTGDs and linear 
TGDs, a subset of the guarded class, is found in [15] . 

• Stickiness. The class of sticky sets of TGDs [16j (or sticky Datalog^) is 
defined by means of syntactic restriction on the rule bodies, which ensure 
that each sticky set of TGDs is first-order rewritable, being a FUS, in the 
parlance of [6]. 

The interaction between equality generating dependencies and TGDs has 
been the subject of several works, starting from [35], which deals with functional 
and inclusion dependencies, proposing a class of inclusion dependencies called 
key-based, which, intuitively, has no interaction with functional dependencies 
(key dependencies, in this particular case) thanks to syntactic restrictions. The 
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absence of interaction between EGDs and TGDs is captured by the notion of 
separability, first introduced in |18j for key and inclusion dependencies, and also 
adopted (though sometimes not explicitly stated), for instance, in [T^HHHS] " 
see [13] for a survey on the topic. 

In ontological query answering, normally both finite and infinite models of 
theories are considered. However, when dealing with databases, which are al- 
ways of finite size, it is customary to define query answering (see Definition [5]) 
only on finite instances. The property that ensures that answering under finite 
and arbitrary (finite and infinite) models is equivalent is called finite control- 
lability, and it was proved for restricted classes of functional and inclusion de- 
pendencies in |38j . Finite controllability was proved for the class of arbitrary 
inclusion dependencies in a pioneering work by Rosati |57| . An even more gen- 
eral result is shown in [TJ, where it is shown that finite controllability holds for 
guarded theories. 

A related previous approach to guarded logic programming is guarded open 
answer set programming [37] ■ It is easy to see that a set of GTGDs can be 
interpreted as a guarded answer set program as defined in [37j , but that guarded 
answer set programs are, in general, more expressive than GTGDs, for example, 
because they allow for negation. 

Implementations of ontology-based data access systems take advantage of 
query answering techniques for tractable ontologies; in particular, we mention 
DLV 3 03], Mastro [S5J and NYAYA [SI]. 
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